lab_search

Sophisticated Search

Due: Mar 20, 23:59 PM

Learning Objectives

  • Identify limitations of binary search with multiple matches
  • Implement binary range searching for large-scale data analysis

Submission Instructions

Using the Prairielearn workspace, test and save your solution to the following exercises. (You are welcome to download and work on the files locally but they must be re-uploaded or copied over to the workspace for submission).

Assignment Description

As you have now seen in class, searching a large collection of objects for a specific target object can be done in O(log n) time if we take the time to first sort our dataset. But what happens if we are interested in finding more than a single matching object? In this lab, we will explore the strategy of a binary range search for finding all matches to an input query in O(log n) time.

NOTE: Don’t use the built in search functions provided by python. The goal of the assignment is to understand how those are implemented. All you need are index access (a[index]) and size checking (len(a)).

NOTE: Both types of binary search have one or more tests for efficiency. Be aware that these tests are easy to fail even for valid solutions as they track the number of times you accessed an element in a list. If you are failing the efficiency test and can’t figure out why, remove print statements or any other extraneous code which is looking up a value of a list at a particular index. If you think your solution is correct and is still failing the autograder, email or post on Piazza.

# INPUT:
# names is an array of names (list[str])
# search_name is the name we're searching for (str)
# OUTPUT:
# the index of the name or -1 if we can't find it (int)
int binary_search(list[str] names, str search_name):

Brad is sick of having to read through his emails to know whether or not a specific person has emailed him. However instead of using the built-in search tool like a normal person, Brad decided he’s going to data mine all names from all his emails into a list and then write search algorithms capable of finding a particular individual. Which is weird but to each their own – the problem is Brad has decided that the best way to write this algorithm is to use a hand-written binary search algorithm and he is forcing you to write it for him.

Save Brad from having to read his emails by writing an algorithm that will search a list of strings for a specific query using binary search. If the query is present, return the index of the name in his list. If your binary search algorithm ends and you haven’t found the name in question, return the value -1 as an integer.

Hint: Python is pretty smart about > and < operators working for strings as well as numbers. (If you are interested, try to see if you can figure out the system for deciding which word is smaller!)

Brad was so happy with your successful implementation of binary search that he’s tried to apply it to all sorts of other problems in his life with mixed success. He’s come back to you now asking for help again, this time because he has a big list of repetitive numbers but he wants to find all the matches for a given number. That is to say he wants to know the starting index and ending index for the range of objects matching a given value.

Write an implementation for binary range search that accesses the input list no more than (k + 2) + log(n) times, for k matches in the list. (This is the worst case time for a simple to implement O(n) variant of binary range search and on average will be better than linear search).

For a challenge (and bonus points), write the true binary range search which has a worst case runtime of O(log n) (and will have a strict implementation check for 2 log(n) list accesses).