lab_bst

Beautiful Binary Trees

Due: Oct 02, 23:59 PM

Assignment Description

In this lab we’ll implement all the core functionalities of a binary search trees, practice our understanding of pointers (and reference pointers), and observe the difference between worst case performance and expected performance in the real world.

Lab Insight

One of the key weaknesses of the BST is that there are no guarantees that it will produce a balanced BST and as a consequence the height of the tree (and the efficiency of all major operations) is O(n). However this paints an unrealistic picture about the real world use of a non-balancing BST. In this lab, observe and contrast how real world datasets perform versus ‘worst case’ performance. As you do so, consider why the BST’s performance is so different from the ‘worst case’.

Getting Set Up

You can find the code for all our released assignments at the following public git repo.

release-f22

You can either clone this or pull a zip file of the repo to your local computer. All the files for this MP are in the lab_bst directory. You should copy that directory into the directory you setup for your docker container.

Testing Your Code

This semester for MPs we are using CMake rather than just make. This allows for us to use libraries such as Catch2 that can be installed in your system rather than providing them with each assignment. This change does mean that for each assignment you need to use CMake to build your own custom makefiles. To do this you need to run the following in the base directory of the assignment. Which in this assignment is the lab_bst directory.

mkdir build
cd build

This first makes a new directory in your assignment directory called build. This is where you will actually build the assignment and then moves to that directory. This is not included in the provided code since we are following industry standard practices and you would normally exclude the build directory from any source control system.

Now you need to actually run CMake as follows.

cmake ..

This runs CMake to initialize the current directory which is the build directory you just made as the location to build the assignment. The one argument to CMake here is .. which refers to the parent of the current directory which in this case is top of the assignment. This directory has the files CMake needs to setup your assignment to be build.

At this point you can, in the build directory, run make as described to build the various programs for the MP.

Testing Your Code

To test your code, compile using make:

make

Part 1: Implementing BST

The provided code base closely resembles the implementation of BST presented in the class lecture. For the first part of the lab, fill in the missing code elements for a binary search tree. Note that the BST here is implemented as a dictionary with key, value pairs as the template <K, V>.

The BST::find() Function

The find() function should return the value as a templated V datatype. You are strongly encouraged to use the provided find(root, key) helper function, which returns a Node* &. Correctly implementing, testing, and understanding find(root, key) will make most of the rest of the assignment trivial.

The BST::insert() Function

The insert() function should return nothing but modify the BST to include the new node in the correct ordered location. Note that this is not a balanced insertion like the AVL tree.

Likewise note that since we are using real-world datasets, it is possible that the same key will be inserted with different values. Your code should only insert the FIRST copy of any item in the dataset. When insert is called on an item already present, it should not change the resulting tree. Hint: The find(root, key) helper function should make this a trivial function to code.

FUN TRIVIA: Most of the trees you will be building in this assignment use strings for keys which has an unusual (and perhaps non-intuitive) relationship with ‘greater than’ or ‘less than’ known as lexicographic ordering. While you will likely see more details about this in future classes (or the honors data structures class), for now it is sufficient to be aware that strings are compared character by character from left to right and that each character has a numeric value associated with it. ASCII 128 and 256 tables can be found here: ASCII and the ‘smaller’ string is the one with the first observed smaller character. So the string a is in fact larger than the string AAAAAA.

The BST::remove() Function

The remove() function should return nothing but modify the BST to not include the new node (if it exists in the tree). When required, use the in-order predecessor (IOP) NOT the in-order successor. Like with insert, the find(root, key) helper function should make this significantly easier to code but you are also encouraged to use the provided swap() function and/or write your own function to find the IOP.

The listBuild() Function

The listBuild() function should return a BST in which every std::pair<K,V> (key, value) pair is stored in the tree in the order that they are inserted. To test this function, you can use the provided text files and the provided readTextFile() function. Note: Although the provided function will always use a <string, int> pair (a word and its in-order index in the file), your BST should remain templated for all possible input pairs.

Part 2: All-Permutation Tree Construction

Now that you have successfully implemented the standard BST, the second half of the assignment is to explore how the order of insertion effects the shape (and in particular height) of a BST. We will do this by creating every possible permutation of input order, building a separate tree for each, and keeping track of the raw counts of each height from 0 to n-1. Warning: The all-permutation problem is going to create n! trees; do not make n large here or you will have a bad time.

The allBuild() Function

The allBuild() function also takes in a list of type std::pair<K,V> (key, value) but instead of producing a single tree, you should produce a separate tree for all possible permutations of the input dataset (each permutation using the full input list). The return of allBuild() is a histogram of the number of trees of a particular height built. This information should be stored in an std::vector<int> of size n where the total number of trees with height h is stored at index v[h]. For simplicity, you should assume that there will always be at least one input item in the vector (and thus the vector starts at height 0 and not height -1).

For example, if my input list is [(K1, V1), (K2, V2), (K3, V3)], then the expected output is {0, 2, 4} as there are 3! permutations of the list, two of which will result in a tree of height 1 and four of which will result in a tree of height 2.

HINT: While you can manually construct all permutations, you may find it useful to use the C++ function next_permtuation. Given an input vector, this function will rearrange the items in the vector into the next largest configuration (its in-order-successor) and will stop once it has reached the largest possible ordering. So if you start from the smallest possible configuration (a completely sorted vector), you can produce all possible permutations with no hassle.

Testing Your Code with Catch

Run the Catch tests as follows (this requires your code to compile when run simply as make):

make test
./test

Submitting Your Work

The following files are used in grading:

  • bst.h
  • bst.hpp

All other files including any testing files you have added will not be used for grading.

Good luck!