CS 225 | Honors

Assignment Description

The algorithms and data structures presented in CS 225 are chosen to be a strong but broad foundation of CS knowledge. As a team of two, go beyond the scope of the class to propose, implement, and benchmark a data structure or algorithm which is not covered in the class.

You will only receive credit if you successfully complete all of the following components:

A project proposal
Weekly development logs and mid-project checkin meeting
A functional code base with tests for correctness
A final report (and presentation) that proves correctness, benchmarks performance, and estimates the runtime efficiency of your method.

You should thoroughly read through this page before deciding if you will participate in an extra credit project!

All extra credit projects will be teams of 2! There are no exceptions for individuals or larger groups of any kind.

Team Formation

To participate in the CS 225 Project, you must create your own team. This semester team formation will be done using Prairielearn. One member of the team should create a new group using an UIUC-appropriate group name and the rest can join the project with the generated join code.

You will not be able to manually leave a final project team once created on Prairielearn. Be sure you and your partner are on the same page about your project (and who is creating vs who is joining) before starting!

Project Proposal (Approval by November 1st)

Your proposed final project must be the reproduction of a known (named) algorithm or data structure not covered in this or previous CS/ECE courses. Some suggested algorithms and data structures can be found under the ‘Project Goals’ link but you can also propose your own provided there is a peer-reviewed paper or textbook describing the method in detail. Due to the difficulty in manually grading proposals, each team is only guaranteed one chance to propose a project. Please make sure you have all required components before submitting.

The proposal itself has four parts:

A GitHub repository. Your repository should be private for the duration of the course but must include both course instructors (and eventually your project mentor) as contributors. (The Prairielearn submission has the required github account information for course instructors). It is up to you to maintain your repository but it should be based on the standard structure presented in the example project.
A processed dataset. Your algorithm must be benchmarked on a collection of datasets of different sizes. Accordingly, inside your GitHub repo’s data directory, you must have a collection of either .txt or .csv-formatted files containing the processed dataset you plan to use. While you can add additional datasets during the course of the project, the submitted proposal must contain enough already processed and formatted data to complete the project. If your dataset is too large to store in your GitHub repo, a link should be provided to your processed dataset.
A set of tests to prove algorithm correctness. Writing tests before writing your code can be very beneficial for formalizing exactly what your code should be doing. Accordingly, inside your GitHub repo’s code/tests directory, you must have a single file containing at least three written tests that would (once passed) prove your algorithm works on at least one non-trivial (non-empty) input dataset. You are not expected to write all tests ahead of time but the format should match that of the provided tests in every CS 225 assignment. DO NOT WRITE TESTS BASED ON THE OUTPUT OF YOUR CODE AS A CIRCULAR WAY OF ‘PROVING’ CORRECTNESS. THE POINT OF TESTS IS TO PROVE YOUR OUTPUT IS CORRECT, NOT THAT IT IS DETERMINISTIC
A written proposal. In no more than a few paragraphs and in your own words, describe what algorithm or data structure you will be implementing. Your proposal must have all of the following details:
- Academic Reference Your proposed algorithm or data structure must be a named existing method (not a ‘novel’ data processing pipeline). You must include an peer-reviewed paper or textbook (with a direct link to the appropriate pages) that defines the algorithm or data structure you plan to implement.
- Algorithm Summary In your own words, describe in no more than a paragraph what the algorithm does. You can refer to your function inputs and outputs to help you explain your approach.
- Function I/O What are the expected inputs and outputs for your algorithm? You should break down the core functions you plan to implement that will go from your starting input dataset to the final correct output for your method. As part of this description, relate how your tests prove correctness for these functions. Please see the example project for a loose guideline for how to organize your proposal.
As you write these, consider the following:
```
 * Do you have to do anything to convert your csv input for the algorithm described? (Ex: A graph algorithm would require making the input into a graph.) 
 * Does your algorithm need more than just a dataset as input? (Ex: A* search requires a heuristic algorithm. If you choose to do A*, what are some possible heuristics you might use?)
 * If your algorithm works in stages, consider writing a separate function for each stage. (Ex: Many string algorithms have a preprocessing step. If you do one, how does the output of one function lead into another?) 
```
- Data Description Briefly describe where you got your data and how you processed it to a csv or txt file. If you wrote scripts or code (in any language) to automate your data processing, you should include them in your repo and describe them here.

Project proposals will only be evaluated if they precisely match the expected format and contain all required information.

Development Log (Done weekly after project approval)

A successful final project is built slowly over many weeks not thrown together at the last minute. To incentivize good project pacing and to let your project mentor stay informed about the status of your work, each week you are required to submit a development log detailing:

What goals you had set for the week and whether they were accomplished or not
What specific tasks each member of your team accomplished in the week
What problems you encountered (if any) that prevented you from meeting your goals
What you plan to accomplish next week

The development log will be graded for completion, detail, and honesty – not progress. It is much better to truthfully evaluate the work you completed in a week then lie to make the project sound further along then it really is. It is totally acceptable to have an entry that says you tried nothing and accomplished nothing. However if every week starts to say that, both you and your project mentor will be able to identify the issue before it becomes impossible to fix.

Mid-Project Checkin (Meet before November 20th)

Both members of your team are required to meet with your project mentor for a check-in meeting at least once before the end of the project. You do not need to prepare a presentation but should come prepared to summarize your progress as well as have a frank discussion about any issues or concerns you have encountered as a team or as an individual team member. The goal here is to ensure that forward progress is being made and to address any issues that are impeding progress while there is still time to correct and recover. To that end, you should be up front and honest about your current progress.

That said, you should try to schedule your meeting once you have what you believe is a functioning version of your proposed algorithm (but before you have done your full benchmarking). You will be expected to explain and demonstrate the tests you have written proving that the algorithm works.

Final Project Deliverables

There are four main deliverables for this final project:

A functional code-base. Your code must either work on the current semester’s docker container or with special arrangements with your mentor in a system that you agreed on. It must run with no errors using simple command line arguments and have loosely reproducable runtimes.
A descriptive README. In addition to the code itself, you must include a human-readable README.md which describes:
- Github Organization – You should describe the physical location of all major files and deliverables (code, tests, data, the written report, the presentation video, etc…)
- Running Instructions – You should provide full instructions on how to build and run your executable, including how to define the input data and output location for each method. You should also have instructions on how to build and run your test suite. It is in your best interest to make the instructions (and the running of your executables and tests) as simple and straightforward as possible.
A written report. In addition to your code, your Github repository must contain a file final_report.pdf file which describes:
- The output and correctness of your algorithm – You should go over your entire test suite and explain in words how the total test suite has proven your algorithm is correct.
- Benchmarking of your algorithm and a proposed Big O – Using at least five meaningfully distinct dataset sizes of increasing size, submit a figure with labeled axis that shows your benchmarked runtimes for each input dataset. Using this information and your code base (and general theory knowledge), write a detailed justification for the overall Big O of your method. If your approach ultimately did not match the theoretical Big O of your data structure, discuss why.
A final presentation. Alongside your final report, you should submit a short video (10 minutes or less) describing your project. Your presentation should include slides or other visual aids and include the following content:
- Your Goals (Suggested time: 1-2 minutes) The presentation should begin with a summary of your algorithm from a conceptual point of view – what are the inputs, what is the output, how does it work?
  
  Tip: Think of this as ‘teaching your algorithm’ to an audience who has seen CS 225 content but nothing further.
- Your Development (Suggested time: 2-3 minutes) The presentation should include a high level overview of the work you put into the presentation. This is not meant to be a line by line recounting of your code but a highlight reel of the various design decisions you made and the challenges you encountered – and hopefully overcame – while working on the project.
  
  If you were unable to complete some minor follow-up work but want to still discuss it, this is the best opportunity to explain it! Bring up what you did that didn’t work out, how you tried to address the problem, and what you might do in the future if you were tasked to do this or a similar project again.
  
  Tip: If you are struggling to identify content here, ask yourself questions like: “How did we get the data we wanted?”, “How did we choose our implementation strategy for an algorithm?”, “How did we ultimately test our code to ensure that it is working?”
- Your Conclusions (Suggested time: 3-5 minutes) The presentation should end by summarizing your final report by conclusively proving (1) that your algorithm is correct, (2) that your benchmarking is correct, and (3) explaining your final Big O analysis of your method. Ambitious teams who went above and beyond the benchmarking can also discuss how your results led you to discover something interesting involving your real-world dataset.
In addition to quantitative results, your conclusions should also end with some individual thoughts you had about the project. What did you learn, what did you like or didn’t like, and what would you explore or implement next if given more time?

To submit your final project video, you may either include it on Github or include a direct link to the video on your team Github. Videos can be hosted through Zoom cloud recordings, Youtube, Google drive, etc…