Assignment Description
In this lab you will be implementing functions on hash tables with three different collision resolution strategies — separate chaining, linear probing, and double hashing. These hash tables serve an implementation of the dictionary abstract data type.
Lab Insight
Hashing is very powerful as it enables us to build data structure like hash tables and maps. On top of which, there are variations of hashing that can be used to help encrypt data. If you are interested in learning more about the applications of hashing, you can take CS 498 Applied Cryptography, CS 461 Computer Security I, and CS 463 Computer Security II.
Checking Out the Code
All assignments will be distributed via our release repo on github this semester. You will need to have set up your git directory to have our release as a remote repo as described in our git set up
You can merge the assignments as they are released into your personal repo with
git pull --no-edit --no-rebase release main
git push
if you are using multiple machines you may need to use the following to allow them to work correcly.
git pull --no-edit --no-rebase release main --allow-unrelated-histories
git push
The first git command will fetch and merge changes from the main
branch on your remote repository named release
into your personal. The --no-edit
flag automatically generates a commit message for you, and the--no-rebase
flag will merge the upstream branch into the current branch. Generally, these two flags shouldn’t be used, but are included for ease of merging assignments into your repo.
The second command will push to origin
(your personal), which will allow it to track the new changes from release
.
You will need to run these commands for every assignment that is released.
All the files for this lab are in the lab_hash
directory.
Preparing Your Code
This semester for MPs we are using CMake rather than just make. This allows for us to use libraries such as Catch2 that can be installed in your system rather than providing them with each assignment. This change does mean that for each assignment you need to use CMake to build your own custom makefiles. To do this you need to run the following in the base directory of the assignment. Which in this assignment is the lab_hash
directory.
mkdir build
cd build
This first makes a new directory in your assignment directory called build
. This is where you will actually build the assignment and then moves to that directory. This is not included in the provided code since we are following industry standard practices and you would normally exclude the build directory from any source control system.
Now you need to actually run CMake as follows.
cmake ..
This runs CMake to initialize the current directory which is the build
directory you just made as the location to build the assignment. The one argument to CMake here is ..
which referes to the parent of the current directory which in this case is top of the assignment. This directory has the files CMake needs to setup your assignment to be build.
At this point you can in the build directory run make as described to build the various programs for the MP.
The code for this activity resides in the lab_hash/
directory.
If you want to speed up compile time on a make
, try using make -j <target>
,
ie make -j test
Notes About list
Iterators
When you are working with the Separate Chaining Hash Table, you will need to
iterate over the linked list of a given bucket. Since the hash tables are
templatized, however, this causes us a slight headache syntactically in C++. To
define a list
iterator on a given bucket, you will need to declare it as
follows:
typename list< pair<K,V> >::iterator it = table[i].begin();
If you use the list::erase()
function, be advised that if you erase the
element pointed to by an iterator that the parameter iterator is no longer
valid. For instance:
typename list< pair<K,V> >::iterator it = table[i].begin();
table[i].erase(it);
it++;
is invalid because it
is invalidated after the call to erase()
. So, if
you are looping with an iterator, remember a break
statement after you call
erase()
!
Separate Chaining Hash Table
Open your schashtable.hpp
. In this file, several functions have not been
implemented—your job is to implement them.
insert
insert
, given akey
and avalue
, should insert the(key, value)
pair into the hash table.- You do not need to concern yourself with duplicate keys. When in client code and using our hash tables, the proper procedure for updating a key is to first remove the key, then re-insert the key with the new data value.
- Here is the Doxygen for
insert
.
find
- given a
key
, should return the correspondingvalue
associated with that key - Here is the Doxygen for
find
.
remove
- Given a key, remove it from the hash table.
- If the given key is not in the hash table, do nothing.
- You may find the Doxygen for
remove
helpful.
resizeTable
- This is called when the load factor for our table is \(\ge 0.7\).
- It should resize the internal array for the hash table. Use the return value
of
findPrime
with a parameter of double the current size to set the size. See other calls toresize
for reference. - Here is the Doxygen for
resizeTable
.
Linear Probing Hash Table
Open your lphashtable.hpp
. In this file, you will be implementing the
following functions.
insert
insert
, given akey
and avalue
, should insert the(key, value)
pair into the hash table.- Remember the collision handling strategy for linear probing! (To maintain compatibility with our outputs, you should probe by moving forwards through the internal array, not backwards).
- You do not need to concern yourself with duplicate keys. When in client code and using our hash tables, the proper procedure for updating a key is to first remove the key, then re-insert the key with the new data value.
- Here is the Doxygen for
insert
. - You MUST handle collisions in your
insert
function, or your hash table will be broken!
findIndex
- given a
key
, should return the correspondingindex
associated with that key - Here is the Doxygen for
findIndex
.
remove
- Given a key, remove it from the hash table.
- If the given key is not in the hash table, do nothing.
- You may find the Doxygen for
remove
helpful.
resizeTable
- This is called when the load factor for our table is \(\ge 0.7\).
- It should resize the internal array for the hash table. Use the return value
of
findPrime
with a parameter of double the current size to set the size. See other calls toresize
for reference. - Here is the Doxygen for
resizeTable
.
Double Hashing Hash Table
Open your dhhashtable.hpp
. In this file, you will be implementing the
following functions.
insert
insert
, given akey
and avalue
, should insert the(key, value)
pair into the hash table.- Remember the collision handling strategy for double hashing! (To maintain compatibility with our outputs, you should probe by moving forwards through the internal array, not backwards).
- You do not need to concern yourself with duplicate keys. When in client code and using our hash tables, the proper procedure for updating a key is to first remove the key, then re-insert the key with the new data value.
- Here is the Doxygen for
insert
. - You MUST handle collisions in your
insert
function, or your hash table will be broken!
findIndex
- given a
key
, should return the correspondingindex
associated with that key - Here is the Doxygen for
findIndex
.
remove
- Given a key, remove it from the hash table.
- If the given key is not in the hash table, do nothing.
- You may find the Doxygen for
remove
helpful.
Grading Information
The following files (and ONLY those files!!) are used for grading this lab:
dhhashtable.hpp
lphashtable.hpp
schashtable.hpp
If you modify any other files, they will not be grabbed for grading and you may end up with a “stupid zero.”