lab_search

Sophisticated Search

Due: Oct 14, 23:59 PM

Learning Objectives

  • Review fundamental search strategies on lists of objects
  • Identify limitations of binary search with multiple matches
  • Implement binary range searching for large-scale data analysis

Setup

From your CS 277 git directory, run the following:

git fetch release
git merge --allow-unrelated-histories release/lab_search -m "Merging initial lab_search files"

Upon a successful merge, your lab_search files are now in your lab_search directory.

Assignment Description

As you have now seen in class, searching a large collection of objects for a specific target object can be done in O(log n) time if we take the time to first sort our dataset. But what happens if we are interested in finding more than a single matching object? In this lab, we will explore the strategy of a binary range search for finding all matches to an input query in O(log n) time.

NOTE: Don’t use the built in search functions provided by python. The goal of the assignment is to understand how those are implemented. All you need are index access (a[index]) and size checking (len(a)).

NOTE: Each function has one or more tests for efficiency (although the lienar search is just for calibration and is worth 0 points). Be aware that these tests are easy to fail even for valid solutions as they track the number of times you accessed an element in a list. If you are failing the efficiency test and can’t figure out why, remove print statements or any other extraneous code which is looking up a value of a list at a particular index. If you think your solution is correct and is still failing the autograder, email or post on Campuswire.

Brad is sick of having to read through his emails to know whether or not a specific person has emailed him. However instead of using the built-in search tool like a normal person, Brad decided he’s going to data mine all names from all his emails into a list and then write search algorithms capable of finding a particular individual.

The problem is Brad has decided that the best way to write this algorithm is to use a naive linear search starting from the first object in the list and he is forcing you to write it for him.

Write an algorithm that will start at the first object in the list and check if it matches the search name. If it does, return the index of the name in his list. If not, look at the next element in the list in order. If you reach the end of the list and haven’t found the name in question, return the value -1 as an integer.

# INPUT:
# names is an array of names (list[str])
# search_name is the name we're searching for (str)
# OUTPUT:
# the index of the name or -1 if we can't find it (int)
int linear_search(list[str] names, str search_name):

But wait? This can take really long when there are lots of names and Brad is really impatient. Thankfully, using your knowledge of sorting and search strategies for lists, you know that if the input data is sorted, it is possible to use binary search to perform the same search in O(log n) time.

Save Brad from having to actually read his emails by writing up binary search for names. As with the linear search, return the index position of the name (if found) and zero otherwise.

# INPUT:
# names is an array of names (list[str])
# search_name is the name we're searching for (str)
# OUTPUT:
# the index of the name or -1 if we can't find it (int)
int binary_search(list[str] names, str search_name):

Hint: Python is pretty smart about > and < operators working for strings as well as numbers. (If you are interested, try to see if you can figure out the system for deciding which word is smaller!)

Brad was so happy with your successful implementation of binary search that he’s tried to apply it to all sorts of other problems in his life with mixed success. He’s come back to you now asking for help again, this time because he has a big list of repetitive numbers but he wants to find all the matches for a given number. That is to say he wants to know the starting index and ending index for the range of objects matching a given value.

Write an implementation for binary range search that accesses the input list no more than (k + 2) + log(n) times, for k matches in the list. (This is the worst case time for a simple to implement O(n) variant of binary range search and on average will be better than linear search).

For a challenge (and bonus points), write the true binary range search which has a worst case runtime of O(log n) (and will have a strict implementation check for 2 log(n) list accesses).