Learning Objectives
- Implement multiple forms of hashing
- Determine collision frequency and run-time differences in hashing approaches
Submission Instructions
Using the Prairielearn workspace, test and save your solution to the following exercises. (You are welcome to download and work on the files locally but they must be re-uploaded or copied over to the workspace for submission).
You will only need to modify the following files:
lab_hash.ipynb
Assignment Description
Hashing is one of the most important concepts in computer science (and my personal favorite topic). A well-defined general hash function converts any object in the universe into a clear numeric address that can then be used to store the relevant details of that object. And – if this is done well – all of this can be done in near-constant time!
In this lab, you will be tasked with implementing insert and find for three different hashing strategies discussed in class. You are strongly encouraged to play around with the size of the hash table you are building, the hash functions you are using, as well as the contents of what you are passing in (the provided code contains multiple examples).
Part 1: Separate Chaining
# INPUT:
# A list of strings, inList
# A hash function, hash (provided as h1, h2, or h3)
# An integer m defining the intended size of the output list
# OUTPUT:
# Strings stored as a 2D list of lists (hashed with separate chaining)
List[List[str]] hash_chain(List[str] inList, function hash, int m):
# INPUT:
# A hash table of size m (produced by hash_chain)
# A string, query, being searched for
# A hash function, hash (provided as h1, h2, or h3)
# An integer m defining the intended size of the output list
# OUTPUT:
# An integer denoting the number of steps it took to find the query
# OR -1 if query was not found.
int find_chain(List[str] inList, str query, function hash, int m):
The first (and most straightforward) hashing strategy is separate chaining. Given an input hash function and array size, hash_chain()
should insert every object in the inputList into a two-dimensional list of lists representing the array itself and the corresponding chains. find_chain()
should return the number of steps it took to find the query using separate chaining (or -1 if the query is not found). To match the autograder, you must insert from the front (as though our list was a linked list).
To be clear on the dimensions of the hash table, the table should be initialized to m
empty lists and then each element is inserted. Thus you could have a single list of n
elements or a better distribution of values based on the hash function in question. More directly: Your output should have m
‘rows’ but is not guaranteed or likely to be a square matrix. The length of each row varies but can at most be n
NOTE: In all of these functions, one of the input itself is a function. To go full circle to the earliest days of the class, everything in Python is an object – including functions! So this is a totally reasonable function input we have ignored until now.
Part 2: Linear Probing
# INPUT:
# A list of strings, inList
# A hash function, hash (provided as h1, h2, or h3)
# An integer m defining the intended size of the output list
# OUTPUT:
# Strings stored as a list (hashed with linear probing)
List[str] hash_linear(List[str] inList, function hash, int m):
# INPUT:
# A hash table of size m (produced by hash_chain)
# A string, query, being searched for
# A hash function, hash (provided as h1, h2, or h3)
# An integer m defining the intended size of the output list
# OUTPUT:
# An integer denoting the number of steps it took to find the query
# OR -1 if query was not found.
int find_linear(List[str] inList, str query, function hash, int m):
Linear probing resolves a hash collision at an index i by attempting to store the value at position i+1 (and repeating until an open space has been found). You should implement both hash_linear()
and find_linear()
to take an arbitrary hash function as input and resolve hash collisions using linear probing.
HINT: You will need to have TWO break conditions for when it is clear that the object isn’t present – one for when the linear probing search identifies that the object was not inserted and one for the edge case where the entire table is full (but the object also isn’t present). How can you detect the former? The latter?
Part 3: Double Hashing
# INPUT:
# A list of strings, inList
# Two hash functions, h1 and h2
# An integer m defining the intended size of the output list
# OUTPUT:
# Strings stored as a list (hashed with double hashing)
List[str] hash_double(List[str] inList, function h1, function h2, int m):
# INPUT:
# A hash table of size m (produced by hash_linear)
# A string, query, being searched for
# A hash function, hash (provided as h1, h2, or h3)
# An integer m defining the intended size of the output list
# OUTPUT:
# An integer denoting the number of steps it took to find the query
# OR -1 if query was not found.
int find_linear(List[str] inList, str query, function h1, function h2, int m):
Double hashing attempts to reduce the frequency of repeated hash collisions by giving each unique object a different pattern for resolving collisions based on a second hash h2. Unlike linear probing, it is entirely possible for an object to be impossible to insert despite having open positions still in the hash table. When inserting and finding into the hash table, you should break any operation after attempting to insert or find m
times.
To be clear here: double hashing using the provided hash functions can occassionally result in inserts that are impossible. For example, if the second hash value is 0 it doesnt matter how many times you try to rehash, it will always produce the same number. Your solution should be aware of this limitation and – when it cannot insert after m
iterations – stop. Failure to do this will likely break the autograder (which has random tests that will unpredictably result in this behavior).