Assignment Description
In a team of three to five students, you will propose and produce a final project involving some manner of complex data structure such as a graph with data analysis using graph algorithms. As a team, you have considerable freedom to choose a project of interest to you.
First Deliverables (Due March 27)
Team Formation
To participate in the CS 225 Project, you must create your own team of three to five students. This semester team formations (and most deliverable submissions) will be done using Prairielearn.
Team Contract
As a team, you must submit a 1-2 page document as a MD file which formalizes your team’s views on both core logistic issues as well as common pitfalls you may encounter over the course of your project. Once signed by each member of your team, it should be considered a binding agreement for all parties. Breaches of this contract can and should be brought up internally and – if not resolved – brought to the attention of course staff.
Accordingly, the document should – at minimum – include the following major Communication and Collaboration issues.
Communication Determining how to communicate with your teammates as well as how often you should be communicating is key to a successful remote project. Discuss with your team and draft a statement detailing the following:
-
Team Meetings When and how often often will your team meet? How long should each meeting last? What software or tool will you use to host these meetings? Will someone take notes (record minutes)?
-
Assistance How will your teammates be able to contact you if they need your help or opinion on a task? How quickly should you be expected to respond?
-
Respect An effective team needs to have an environment which encourages open expression of ideas. How will you ensure that every member has an opportunity to speak and, more importantly, that every member will actively listen and engage with the thoughts of others?
Collaboration The final project tasks you with finding a fair distribution of labor where each student has some role in the development of each deliverable. However the details of this distribution are up to you. Discuss with your team and draft a statement detailing the following:
-
Work Distribution How will you assign workload for this project? How will you address unexpected complications or unforeseen work? You are encouraged to identify the strengths and desires of each team member when distributing work. You do not need to all work equally on a particular deliverable – it is the overall work that should be largely equal.
-
Time Commitment How many hours of work per week is expected of each group member? Are there prior time commitments that need to be accounted for? How will you address new conflicts or commitments when they do inevitably occur?
-
Conflict Resolution How will the team resolve situations where there is a disagreement between members? Situations where one or more members have not accomplished their tasks? Situations where one or more members are habitually late? Are there other hypothetical situations that you as an individual or as a team want to discuss ahead of time? When issues occur, you are strongly encouraged to inform course staff, but only after first trying to resolve the issue as a team in a respectful manner.
Final Project Proposal
Even if you choose to use one or more of the suggested example project goals, as a team you are responsible for submitting a detailed project proposal according to the guidelines below Groups which do not submit an adequate proposal will not be allowed to compleate an extra credit project.
contains the following information:
Leading Question Your final project should have a clear conclusion or target goal – given a dataset and a code base that implements some graph algorithms, what can you learn from the dataset? Are you hoping to solve a specific problem? You should clearly describe how your team will use your dataset and algorithms to answer your leading question. Be thorough in your description – this is the foundation of your project and if your mentor cannot follow your logic, you will not be able to proceed further on the project. NOTE: Not every algorithm implemented in the project must directly answer this question, but you must answer the question using the algorithms you have selected.
Major Deliverables You must describe the work you will be doing for your project. This must include at least one major code deliverable per member of your group. Your major deliverables must include at least one algorithm or data structure that is not covered in this or previous CS/ECE courses. Finally while you can use code from other sources such as libraries or cited sources you will only be graded for code that is written by members of the group. Any code from cited sources or in libraries will not count as part of the work done on the project.
Examples of major deliverables can include the following as well as other algorithms and data structures you find on your own.
- Graph data structure (coverd in CS 225)
- Dijkstra's Algorithm (coverd in CS 225)
- Floyd-Warshall’s Algorithm (covered in CS 225)
- Fibonacci Heap
- Soft heap
- Splay Trees
- Dynamic Trees
- Spring-electrical Model Graph Visualization
- Multidimensional Scaling Graph Visualization
- A* Search
- Delta-Stepping SSSP
Each major component must stand on its own as a significant task. For example Dijkstra’s algorithm can be seen as a special case of A* so if you were to implement Dijkstra’s algorithm as one of your major delvierables you would not be able to choose A* as another one.
Dataset Acquisition and Processing If your final project uses a dataset and your proposal must clearly describe what dataset. This includes succinctly describing:
- **Data access.** In roughly a paragraph, you should describe how you are aquiring your data set and how your mentor can get a copy.
- **Data format.** In roughly a paragraph, you should describe to the best of your ability the specifics of your input dataset. At minimum this includes: What is the source of the dataset and what is the input format of said dataset? How big is the dataset? Do you plan to use all of the data or only a subset? If so, how will you define the subset?
- **Data Correction.** In a paragraph or two, you should describe how will you parse the input data and what checks are you doing to ensure the input data is error-free. At minimum this should dicuss how you will check for missing entries and how will you correct such instances when you find them. Depending on the dataset, it is also reasonable to check for values that are not physically possible or values which are statistical outliers. *Note: These are just suggestions -- you may have many other ideas for how to find and correct problems in your dataset*
- **Data Storage.** In a paragraph or two, you should describe what data structure are you using to store the data within your code. If you need any auxiliary data structures or preprocessed tables, you should also discuss them here. This should include how all parts of the data map to any graph or structure you will be using in the algorithms. **As part of this proposal you must include an estimate of the total storage costs for your dataset in Big O notation.**
Algorithms In no more than a few paragraphs, describe what algorithms you will use to answer the leading question. You should spend some time considering what algorithms you might try and, for all major functions you plan to use, include the following details in your proposal:
- **Academic Reference** You must include an academic reference that describes the algorithm you will be using. This can be the course website for material covered in this course. For other algorithms the reference should be either a paper on the algorithm or a academic book that covers the algorithm.
- **Function Inputs** What are the expected inputs for your algorithm? Do you have to do anything to convert your stored dataset into a usable input for the algorithm described? (Ex: A graph algorithm would require making the input into a graph.) For the more complex algorithms, be sure to include as part of the input any additional information you might need. For example, A* search requires a heuristic. If you choose to do A*, what are some possible heuristics you might use?
- **Function Outputs** What is the expected output for your algorithm? How will you store, print, or otherwise visualize the outcome?
- **Function Efficiency** Your algorithm likely has a theoretically optimal Big O that you can find online. But most algorithms also have multiple implementations and there is no guarantee that your implementation of this algorithm is optimal. **As part of this proposal you must include an estimate or target goal on the Big O efficiency of your algorithm in both time and memory.**
- **Testing Strategy** You must describe the testing strategy you will be using to test your algorithm. This need to provide enough detail for evaluation but does not have to have test case or specific details.
Timeline As a team, identify a list of tasks such as data acquisition, data processing, completion of each individual algorithm, production of final deliverables, etc… and write a proposed timeline for the completion of these tasks. You are not required to adhere strictly to this timeline but it should represent a reasonable set of benchmarks to strive for. For example, stating that you will finish all graph algorithms over the span of a single week is not reasonable. At least one proposed task must be completed before the mid-project checkin – part of the mid-project grade will be based on whether or not this target goal was met.
If your proposal is not accepted due to missing required contents your team will not be allowed to complete a project for extra credit.
Github Repo Creation (After Proposal Accepted)
As a team, you are responsible for creating and maintaining your own Github code repository.
Development Log (Done weekly after project aproval)
A successful final project is built slowly over many weeks not thrown together at the last minute. To incentivize good project pacing and to let your project mentor stay informed about the status of your work, each week you are required to submit a development log detailing:
- What goals you had set for the week and whether they were accomplished or not
- What specific tasks each member of your team accomplished in the week
- What problems you encountered (if any) that prevented you from meeting your goals
- What you plan to accomplish next week
The development log will be graded for completion, detail, and honesty – not progress. It is much better to truthfully evaluate the work you completed in a week then lie to make the project sound further along then it really is. It is totally acceptable to have an entry that says you tried nothing and accomplished nothing. However if every week starts to say that, both you and your project mentor will be able to identify the issue before it becomes impossible to fix.
Mid-Project Checkin (April 17 – 21)
A few weeks into the final project, you are required to meet with your project mentor for a check-in meeting. You do not need to prepare a presentation but should come prepared to summarize your progress as well as have a frank discussion about any issues or concerns you have encountered as a team or as an individual team member. The goal here is to ensure that forward progress is being made and to address any issues that are impeding progress while there is still time to correct and recover. To that end, you should be up front and honest about your current progress.
While a significant amount of points for the checkin meeting is awarded for attending as a team, for full credit in the mid-project meeting you must have also completed at least one of your chosen major deliverables is working. You will be expected to demonstrate in the meeting the tests you have written proving that the algorithm works. This is to encourage you to start working on the final project long before the final weeks and ensure that you are writing real tests for your code as you develop it.
Final Project Deliverables (May 5)
There are four main deliverables for this final project. As a team, you are expected to distribute work on each deliverables fairly. This means that each student should be responsible for some part of each of the following:
-
A functional code-base. Your code must either work on the default docker container or with special arrangements with your mentor in a system that you agreed on. It will be tested for reproducibility of your original results and it’s capacity to run on datasets of our choosing that exactly match your proposed formatting. Your code will be graded based on the following metrics:
-
Code Execution – How easy is it to run your code? For full credit, your code should be runnable using simple command line arguments, which include the ability to alter or adjust the input data or output location.
-
Code Efficiency – Does your code match your target Big O efficiencies? For full credit, your code should have no obvious inefficiency in implementation and be capable of running to completion on your proposed dataset using reasonable hardware resources.
-
Code Organization – Is your code human-readable? For full credit, all your variables, functions, and classes should be named appropriately and organized comments should detail the input, output, and intended behavior of major code blocks. Additionally, your final submission should be devoid of unnecessary or obsolete code.
-
Code Completion – Have you completed all your algorithms? For full credit, your code must be able to run all the proposed algorithms on the full dataset and have tests proving that the algorithms worked.
-
-
A descriptive README. In addition to the code itself, you must include a human-readable
README.md
which describes:-
Github Organization – You should describe the physical location of all major files and deliverables (code, tests, data, the written report, the presentation video, etc…)
-
Running Instructions – You should provide full instructions on how to build and run your executable, including how to define the input data and output location for each method. You should also have instructions on how to build and run your test suite, including a general description on what tests you have created. It is in your best interest to make the instructions (and the running of your executables and tests) as simple and straightforward as possible.
-
-
A written report. In addition to your code, your Github repository must contain a
results.md
file which describes:-
The output and correctness of each algorithm – You should summarize, visualize, or highlight some part of the full-scale run of each algorithm. Additionally, the report should briefly describe what tests you performed to confirm that each algorithm was working as intended.
-
The answer to your leading question – You should direct address your proposed leading question. How did you answer this question? What did you discover? If your project was ultimately unsuccessful, give a brief reflection about what worked and what you would do differently as a team.
-
-
A final presentation. In addition to your project write-up, you should submit a short video (10 minutes or less) describing your project. Your presentation should include slides or other visual aids and include the following content:
-
Your Goals (Suggested time: 1-2 minutes) The presentation should begin with a summary of your proposed goals and a short statement about what you successfully accomplished and, if necessary, what you were ultimately unable to complete.
Tip: Think of this as ‘setting the stage’ for your presentation, letting the viewer know what you will be discussing for the rest of the talk.
-
Your Development (Suggested time: 2-3 minutes) The presentation should include a high level overview of the work you put into the presentation. This is not meant to be a line by line recounting of your code but a highlight reel of the various design decisions you made and the challenges you encountered – and hopefully overcame – while working on the project.
If you were unable to complete one of your goals, this is the best opportunity to explain what you did that didn’t work out, how you tried to address the problem, and what you might do in the future if you were tasked to do this or a similar project again.
Tip: If you are struggling to identify content here, ask yourself questions like: “How did we get the data we wanted?”, “How did we choose our implementation strategy for an algorithm?”, “How did we ultimately test our code to ensure that it is working?”
-
Your Conclusions (Suggested time: 3-5 minutes) The presentation should end by answering the ‘leading question’ you were hoping to solve. This may include details such as the final or full-scale input dataset you used and the output of each of your algorithms but ambitious teams should focus on how these results led you to discover something interesting involving your real-world dataset. For example, a traversal algorithm on OpenFlights data may be used to identify the shortest path between two airports that your team would like to visit.
In addition to quantitative results, your conclusions should also end with some individual thoughts you had about the project. What did you learn, what did you like or didn’t like, and what would you explore or implement next if given more time?
To submit your final project video, you may either include it on Github or include a direct link to the video on your team Github. Videos can be hosted through Zoom cloud recordings, Youtube, Google drive, etc…
-