Assignment Description
In this lab we’ll implement all the core functionalities of a binary search trees, practice our understanding of pointers (and reference pointers), and observe the difference between worst case performance and expected performance in the real world.
Lab Insight
One of the key weaknesses of the BST is that there are no guarantees that it will produce a balanced BST and as a consequence the height of the tree (and the efficiency of all major operations) is O(n)
. However this paints an unrealistic picture about the real world use of a non-balancing BST. In this lab, observe and contrast how real world datasets perform versus ‘worst case’ performance. As you do so, consider why the BST’s performance is so different from the ‘worst case’.
Checking Out the Code
All assignments will be distributed via our release repo on github this semester. You will need to have set up your git directory to have our release as a remote repo as described in our git set up
You can merge the assignments as they are released into your personal repo with
git pull --no-edit --no-rebase release main --allow-unrelated-histories
git push
The first git command will fetch and merge changes from the main
branch on your remote repository named release
into your personal. The --no-edit
flag automatically generates a commit message for you, and the--no-rebase
flag will merge the upstream branch into the current branch. Generally, these two flags shouldn’t be used, but are included for ease of merging assignments into your repo.
The second command will push to origin
(your personal), which will allow it to track the new changes from release
.
You will need to run these commands for every assignment that is released.
All the files for this lab are in the lab_bst
directory.
Preparing Your Code
This semester for MPs we are using CMake rather than just make. This allows for us to use libraries such as Catch2 that can be installed in your system rather than providing them with each assignment. This change does mean that for each assignment you need to use CMake to build your own custom makefiles. To do this you need to run the following in the base directory of the assignment. Which in this assignment is the lab_bst
directory.
mkdir build
cd build
This first makes a new directory in your assignment directory called build
. This is where you will actually build the assignment and then moves to that directory. This is not included in the provided code since we are following industry standard practices and you would normally exclude the build directory from any source control system.
Now you need to actually run CMake as follows.
cmake ..
This runs CMake to initialize the current directory which is the build
directory you just made as the location to build the assignment. The one argument to CMake here is ..
which referes to the parent of the current directory which in this case is top of the assignment. This directory has the files CMake needs to setup your assignment to be build.
At this point you can in the build directory run make as described to build the various programs for the MP.
Part 1: Implementing BST
The provided code base closely resembles the implementation of BST presented in the class lecture. For the first part of the lab, fill in the missing code elements for a binary search tree. Note that the BST here is implemented as a dictionary with key, value
pairs as the template <K, V>
.
The BST::find()
Function
The find()
function should return the value as a templated V
datatype. You are strongly encouraged to use the provided find(root, key)
helper function, which returns a Node* &
. Correctly implementing, testing, and understanding find(root, key)
will make most of the rest of the assignment trivial.
The BST::insert()
Function
The insert()
function should return nothing but modify the BST
to include the new node in the correct ordered location. Note that this is not a balanced insertion like the AVL tree.
Likewise note that since we are using real-world datasets, it is possible that the same key
will be inserted with different values. Your code should only insert the FIRST copy of any item in the dataset. When insert is called on an item already present, it should not change the resulting tree. Hint: The find(root, key)
helper function should make this a trivial function to code.
FUN TRIVIA: Most of the trees you will be building in this assignment use strings for keys which has an unusual (and perhaps non-intuitive) relationship with ‘greater than’ or ‘less than’ known as lexicographic ordering. While you will likely see more details about this in future classes (or the honors data structures class), for now it is sufficient to be aware that strings are compared character by character from left to right and that each character has a numeric value associated with it. ASCII 128 and 256 tables can be found here: ASCII and the ‘smaller’ string is the one with the first observed smaller character. So the string a
is in fact larger than the string AAAAAA
.
The BST::remove()
Function
The remove()
function should return nothing but modify the BST
to not include the new node (if it exists in the tree). When required, use the in-order predecessor (IOP) NOT the in-order successor. Like with insert, the find(root, key)
helper function should make this significantly easier to code but you are also encouraged to use the provided swap()
function and/or write your own function to find the IOP.
The listBuild()
Function
The listBuild()
function should return a BST
in which every std::pair<K,V> (key, value)
pair is stored in the tree in the order that they are inserted. To test this function, you can use the provided text files and the provided readTextFile()
function. Note: Although the provided function will always use a <string, int>
pair (a word and its in-order index in the file), your BST
should remain templated for all possible input pairs.
Part 2: All-Permutation Tree Construction
Now that you have successfully implemented the standard BST, the second half of the assignment is to explore how the order of insertion effects the shape (and in particular height) of a BST. We will do this by creating every possible permutation of input order, building a separate tree for each, and keeping track of the raw counts of each height from 0
to n-1
. Warning: The all-permutation problem is going to create n!
trees; do not make n
large here or you will have a bad time.
The allBuild()
Function
The allBuild()
function also takes in a list of type std::pair<K,V> (key, value)
but instead of producing a single tree, you should produce a separate tree for all possible permutations of the input dataset (each permutation using the full input list). The return of allBuild()
is a histogram of the number of trees of a particular height built. This information should be stored in an std::vector<int>
of size n
where the total number of trees with height h
is stored at index v[h]
. For simplicity, you should assume that there will always be at least one input item in the vector (and thus the vector starts at height 0
and not height -1
).
For example, if my input list is [(K1, V1), (K2, V2), (K3, V3)]
, then the expected output is {0, 2, 4}
as there are 3!
permutations of the list, two of which will result in a tree of height 1 and four of which will result in a tree of height 2.
HINT: While you can manually construct all permutations, you may find it useful to use the C++ function next_permtuation. Given an input vector, this function will rearrange the items in the vector into the next largest configuration (its in-order-successor) and will stop once it has reached the largest possible ordering. So if you start from the smallest possible configuration (a completely sorted vector), you can produce all possible permutations with no hassle.
Testing Your Code
Run the Catch tests as follows (this requires your code to compile when run simply as make
):
make test
./test
You may find it helpful for debugging purposes to print your tree. We have provided a void BST<K, V>::print(std::ostream& out, bool order) const
with default arguments so this can simply be called as such
BST<int, int> tree;
...
tree.print();
We also have provided another entrypoint in entry/bst.cpp
to run examples of the implemented functions. You can mutate this file as you see fit to run the examples, and buld / execute with
make bst
./bst
Submitting Your Work
The following files are used in grading:
bst.h
bst.hpp
All other files including any testing files you have added will not be used for grading.