Learning Objectives
- Implement multiple forms of hashing
- Determine collision frequency and run-time differences in hashing approaches
Submission Instructions
Using the Prairielearn workspace, test and save your solution to the following exercises. (You are welcome to download and work on the files locally but they must be re-uploaded or copied over to the workspace for submission).
You will only need to modify the following files:
lab_hash.ipynb
Assignment Description
Hashing is one of the most important concepts in computer science (and my personal favorite topic). A well-defined general hash function converts any object in the universe into a clear numeric address that can then be used to store the relevant details of that object. And – if this is done well – all of this can be done in near-constant time!
In this lab, you will be tasked with implementing insert and find for three different hashing strategies discussed in class. You are strongly encouraged to play around with the size of the hash table you are building, the hash functions you are using, as well as the contents of what you are passing in (the provided code contains multiple examples).
Part 1: Separate Chaining
# INPUT:
# inList, A list of strings
# hash, A hash function that will always be h1, h2, or h3 (given hash functions)
# m, An integer defining the size of the output list
# OUTPUT:
# A hash table (as a Python list of linked lists) hashed with separate chaining
# That is to say it is a list where every index is a linkedList (from utils.py)
# Each string in inList is first hashed to a specific linkedList by hash value (address)
# And a new Node storing the string as its data member variable is added to the linkedList
def hash_chain(inList, hash, m):
# INPUT:
# ht, A hash table of size m (produced by hash_chain)
# query, The string being searched for
# hash, A hash function (provided as h1, h2, or h3)
# m, An integer defining the size of the output list
# OUTPUT:
# An integer denoting the number of steps it took to find the query
# OR -1 if query was not found.
def find_chain(ht, query, hash, m):
The easiest way to resolve a hash collision is to allow more than one item to exist in the same position in our array! As seen in class, one possible implementation of this is to combine both of our previous list data types – the array and the linked list.
Given an input hash function and array size, hash_chain()
should insert every object in the inputList into an array of linkedList objects. When a collision occurs, be sure to use the optimal insert-at-front to take advantage of the linked list!
TIP: Consider carefully what the ‘default’ empty hash table using separate chaining would look like. Every position in your array should be an empty linkedList not an empty space. If set up correctly, adding any item is as easy as a linkedList add()
!
Given a hash table, a query, and a hash function, find_chain()
should return the number of steps it took to find the query using separate chaining (or -1 if the query is not found). When counting the number of steps, remember that looking at the head of a list is the first step.
Consider: What relationship exists between the distribution of our inputs at different hash values and the overall runtime?
NOTE: In all of these functions, one of the input itself is a function. To go full circle to the earliest days of the class, everything in Python is an object – including functions! We haven’t seen this in a while so this is a nice review.
Part 2: Linear Probing
# INPUT:
# inList, A list of strings
# hash, A hash function that will always be h1, h2, or h3 (given hash functions)
# m, An integer defining the size of the output list
# OUTPUT:
# A hash table as a list hashed with linear probing
# That is to say it is a list where every index is a string that hashed (with linear probing)
# to that specific index. If there is no string stored at that index, it has the value None
def hash_linear(inList, hash, m):
# INPUT:
# ht, A hash table of size m (produced by hash_linear)
# query, The string being searched for
# hash, A hash function (provided as h1, h2, or h3)
# m, An integer defining the size of the output list
# OUTPUT:
# An integer denoting the number of steps it took to find the query
# OR -1 if query was not found.
def find_linear(ht, query, hash, m):
Linear probing resolves a hash collision at an index i by attempting to store the value at position i+1 (and repeating until an open space has been found). You should implement both hash_linear()
and find_linear()
to take an arbitrary hash function as input and resolve hash collisions using linear probing.
HINT: There were THREE possible break conditions for a probing-based search strategy. Do you remember what those are? Refer back to the lecture (or lab) slides if you can’t remember!
Part 3: Double Hashing
# INPUT:
# inList, A list of strings
# hash1, A hash function that will always be h1, h2, or h3 (given hash functions)
# hash2, A hash function that will always be h1, h2, or h3 (but different from h1)
# m, An integer defining the size of the output list
# OUTPUT:
# A hash table as a list hashed with linear probing
# That is to say it is a list where every index is a string that hashed (with linear probing)
# to that specific index. If there is no string stored at that index, it has the value None
def hash_double(inList, hash1, hash2, m):
# INPUT:
# inList, A list of strings
# query, The string being searched for
# hash1, A hash function that will always be h1, h2, or h3 (given hash functions)
# hash2, A hash function that will always be h1, h2, or h3 (but different from h1)
# m, An integer defining the size of the output list
# OUTPUT:
# An integer denoting the number of steps it took to find the query
# OR -1 if query was not found.
def find_double(ht, query, hash1, hash2, m):
Double hashing attempts to reduce the frequency of repeated hash collisions by giving each unique object a different pattern for resolving collisions based on a second hash h2. While this is significantly better in practice, to use this correctly we need our hash functions to be good – which they arent. Put more plainly, it is entirely possible for an object to be impossible to insert despite having open positions still in the hash table. For example, if my hash table is full on even indices (0, 2, 4, 6, …) and both h1 and h2 are even numbers, I will infinite loop despite having potential open spaces at odd indices.
To compensate, when inserting and finding into the hash table, you should break any operation after attempting to insert or find m
times. Failure to do this will likely break the autograder with a timeout error as there are randomly generated strings in these tests that may unpredictably result in this behavior.