Provided Code Skeleton
We have provided
(tar,
zip)
all the code to get you started on your MP, which
means you will only have to implement the logic behind naive bayes.
- reader.py - This file is responsible for reading in the data set.
It takes in all of the emails, splits each of those emails into a list of words,
stems them if you used the --stemming flag, and then stores all of the lists of
words into a larger list (so that each individual list of words corresponds with a single email).
- mp3.py - This is the main file that starts the program, and computes the
accuracy, precision, recall, and F1-score using your implementation of naive Bayes.
- naive_bayes.py This is the file where you will be doing all of your work.
The function naiveBayes() takes as input the training data, training labels, development set,
and a smoothing parameter. The training data provided is the output from reader.py.
The training labels is the list of labels corresponding to each email in the training data.
The development set is the list of emails that you are going to test your implementation on.
The smoothing parameter is the laplace smoothing parameter you specified with --laplace (it is 1 by default).
You will have naiveBayes() output the predicted labels for the development set from your naive Bayes model.
Do not modify the provided code. You will only have to modify naive_bayes.py.
To understand more about how to run the MP, run python3 mp3.py -h in your terminal.
Code Submission
Submit naive_bayes.py on gradescope.
If you do the extra credit, submit your extra credit version of
naive_bayes.py to the separate extra credit assignment on gradescope.