Final Project

Ferocious Final Projects

Due: Dec 12, 23:59 PM

Github Submission

Your final project will be submitted through Github (not Gradescope!). To create your final project repo, from your CS 277 git directory, run the following:

git fetch release
git merge --allow-unrelated-histories release/final_project -m "Merging initial final_project files"

Upon a successful merge, your final_project files are now in your final_project directory. To submit any file as part of your final project, simply commit it to your git repo in the appropriate location:

  • Your proposal and development log should go in the development sub-folder
  • Your final production-ready code should go in the code sub-folder
  • Your dataset should be described or stored in the data sub-folder
  • Your write-up, presentation, and any results figures go in the results sub-folder
  • The README should remain in the root directory (final_project).

NOTE: Unlike your labs and mps, the initial files in your repo are NOT meant to be the skeleton-code for your project. Use the files already found in the release repo as a silly example on how to structure your own work – not something to directly copy or edit.

Assignment Description

As an individual student, you will propose and produce a final project worth a total of 220 points. You have considerable freedom to choose a project of interest to you and are encouraged to propose something that is of personal interest to you. To help you get started, we have provided two resources: a list of datasets and a list of algorithms which are reasonable for a final project. Note: with the exception of the final deadline, the dates listed here are subject to change.

Final Project Proposal (Due October 20th) [30 points]

Every student must submit a project proposal of no more than two pages which will be read through and returned with comments or suggestions. Proposals will not be considered until they contain all content described below. You cannot start the final project without passing the proposal. (You may resubmit freely until your proposal is up to par).

  1. Leading Question In no more than a paragraph or two, you should describe the target goal of your final project. Given a real world dataset of your choosing, what do you hope to learn or discover and how are you planning to accomplish this? Some leading questions that may help you write your proposal:

    • Are you answering a specific question or are you producing a general tool that will work in other settings or answer a class of related queries?

    • What concrete deliverables (code, figures, etc…) will you produce to answer the leading question?

    • How will you determine the success of a project? How will you know when your code (or the answer you’ve found) is correct?

  2. Dataset Acquisition and Processing Your final project must use at least one publicly accessible dataset and your proposal must clearly describe what dataset you have chosen to use. This includes succinctly describing:

    • Data format. In roughly a paragraph, you should describe to the best of your ability the specifics of your input dataset. At minimum this includes: What is the source of the dataset and what is the input format of said dataset? How big is the dataset? Do you plan to use all of the data or only a subset? If so, how will you define the subset?

    • Data Correction. In a paragraph or two, you should describe how will you parse the input data and what checks are you doing to ensure the input data is error-free. At minimum this includes: How will you check for missing entries and how will you correct such instances when you find them? Depending on the dataset, it is also reasonable to check for values that are not physically possible or values which are statistical outliers. Note: These are just suggestions – you may have many other ideas for how to find and correct problems in your dataset

    • Data Storage. In a paragraph or two, you should describe what data structure are you using to store the data within your Python code. If you need any auxilary data structures or preprocessed tables, you should also discuss them here. As part of this proposal you must include an estimate of the total storage costs for your dataset in Big O notation.

  3. Data Science Algorithm In no more than a few paragraphs, describe what algorithms you will use to answer the leading question. To be considered a valid final project, you must implement at least one algorithm from the list of examples or you must propose an algorithm or set of algorithms that represent an equivalent amount of coding development. You should spend some time considering what algorithms you might try and, for all major functions you plan to use, include the following details in your proposal:

    • Function Inputs What are the expected inputs for your algorithm? Do you have to do anything to convert your stored dataset into a usable input for the algorithm described? (Ex: A graph algorithm would require making the input into a graph.)

    • Function Outputs What is the expected output for your algorithm? How will you store, print, or otherwise visualize the outcome?

    • Function Efficiency Your algorithm likely has a theoretically optimal Big O that you can find online. But most algorithms also have multiple implementations and there is no gurantee that your implementation of this algorithm is optimal. As part of this proposal you must include an estimate or target goal on the Big O efficiency of your algorithm in both time and memory.

  4. Timeline In no more than a paragraph or two (a figure may also help!), write a list of tasks you will need to accomplish to complete your project and write a proposed timeline for the completion of these tasks. You are not required to adhere strictly to this timeline but it should represent a reasonable set of benchmarks to strive for. For example, stating that you will finish implementing your entire project over the span of a single week is not reasonable. At least one proposed task must be completed before the mid-project checkin – part of the mid-project grade will be based on whether or not this target goal was met.

NOTE: The proposal is not a binding agreement but a proposed starting place. You may encounter difficulties partway through that makes your timeline infeasible. You may decide to swap algorithms to better answer your leading question. You may find a more interesting leading question. All of these are a core part of the final project process. However you should be sure to communicate these changes with your project mentor. A major change to the leading question, dataset, or algorithms used must be approved as though it was a new proposal.

UPDATE (10/31/21): If your proposal was not approved and you have not resubmitted at least once, you will lose points if you do not resubmit your proposal by 11/5/21

Development Log (Due weekly from October 29th through December 10th) [30 points]

A successful final project is built slowly over many weeks not thrown together at the last minute. To incentivize good project pacing and to let your project mentor stay informed about the status of your work, each week you should add an entry to your log.md file in the development directory.

Each entry should describe:

  1. What goals you had set for the week and whether they were accomplished or not
  2. What problems you encountered (if any) that prevented you from meeting your goals
  3. What you plan to accomplish or attempt next week

The development log will be graded for completion, detail, and honesty – not progress. It is much better to truthfully evaluate the work you completed in a week then lie to make the project sound further along then it really is. It is totally acceptable to have an entry that says you tried nothing and accomplished nothing. However if every week starts to say that, both yourself and your project mentor will be able to identify the issue before it becomes impossible to fix.

Mid-Project Check-in (November 8th – November 12th) [20 points]

A few weeks into the final project, you are required to meet with your project mentor for a check-in meeting. You do not need to prepare a presentation but should come prepared to summarize your progress as well as have a frank discussion about any issues or concerns you have encountered thus far. The goal here is to ensure that forward progress is being made and to address any issues that are impeding progress while there is still time to correct and recover. To that end, you should be up front and honest about your current progress (and have a development log to go over!).

While the majority of points for the checkin meeting is awarded for attending, for full credit you must have also made some reasonable progress on the assignment. This is to encourage you to start working on the final project long before the final weeks.

To sign up for a mid-project check-in time, use the following link: LINK. All meetings will be in person unless specifically requested.

Final Project Deliverables (Due December 12th) [140 pts]

There are three main deliverables for this final project:

  1. A functional code-base. [100 pts] Your code must be written in Python and should be compilable and runnable with the same policy as all coding assignments. It will be tested for reproducibility of your original results and it’s capacity to run on datasets of our choosing that exactly match your proposed formatting. In addition to the code itself, you must include a human-readable README which describes:

    • The location of all major code, data, and results. (Think of this as describing where all your submitted files are located and how they work together to build your final project)

    • Full instructions on how to build and run your executable, including how to define the input data and output location for each method.

    • Full instructions how to build and run your test suite, including a general description on what tests you have created. It is in your best interest to make the instructions (and the running of your executables and tests) as simple and straightforward as possible.

  2. A written report of your project. [20 pts] In addition to your code, your Github repository must have a written file (.md or .pdf preferred) in your results directory. This one- to two-page final report should describe the final deliverables of your project, including any discoveries made. Your specific results depends on your leading question but should contain:

    • An answer to your leading question.

    • A description of the final coding run (or multiple runs) which led you to your answer

    • A summary of what proof you have (tests you have run) that demonstrate that your algorithm is correct

  3. A final presentation. [20 pts] In addition to your project write-up, you should submit a short video (10 minutes or less) describing your project. Your presentation should include slides or other visual aids and include the following content:

    • Your Goals (Suggested time: 1-2 minutes) The presentation should begin with a summary of your proposed goals and a short statement about what you successfully accomplished and, if necessary, what you were ultimately unable to complete.

      Tip: Think of this as ‘setting the stage’ for your presentation, letting the viewer know what you will be discussing for the rest of the talk.

    • Your Development (Suggested time: 2-3 minutes) The presentation should include a high level overview of the work you put into the project. This is not meant to be a line by line recounting of your code but a highlight reel of the various design decisions you made and the challenges you encountered – and hopefully overcame – while working on the project.

      If you were unable to complete one of your goals, this is the best opportunity to explain what you did that didn’t work out, how you tried to address the problem, and what you might do in the future if you were tasked to do this or a similar project again.

      Tip: If you are struggling to identify content here, ask yourself questions like: “How did I get the data I wanted?”, “How did I choose my implementation strategy for an algorithm?”, “How did I ultimately test my code to ensure that it is working?”

    • Your Conclusions (Suggested time: 3-5 minutes) The presentation should end by answering the ‘leading question’ you were hoping to solve. This may include details such as the final or full-scale input dataset you used and the output of each of your algorithms but ambitious projects should focus on how these results led you to discover something interesting involving your real-world dataset. For example, a traversal algorithm on OpenFlights data may be used to identify the shortest path between two airports that you would like to visit.

      In addition to quantitative results, your conclusions should also end with some individual thoughts you had about the project. What did you learn, what did you like or didn’t like, and what would you explore or implement next if given more time?

    To submit your final project video, you may either include it on Github or include a direct link to the video on your team Github. Videos can be hosted through Zoom cloud recordings, Youtube, Google drive, etc…