K-nearest neighbors Model
Perceptron is a simple linear model, and although this is sufficient
in a lot of cases, this method has its limits.
To see the performance of a different simple classifier,
implement K-Nearest Neighbors using Euclidean distance.
To break ties, use the negative label (no animal class).
You must
implement this algorithm on your own with only standard libraries and
NumPy.
Note: To prevent memory errors due to the autograder's limitations,
your algorithm should iterate through the list of test images rather than
(say) creating a vector containing all the test images.
Our autograder tests will pass in various values for the parameter k.
For your own understanding,
you should try and see what's the highest accuracy you can get.
Also try to understand how this algorithm compares to a perceptron, in accuracy
but also efficiency.
Provided Code Skeleton
We have provided
( tar
zip)
all the code to get you started on your MP, which
means you will only have to implement the logic behind perceptron.
- reader.py - This file is responsible for reading in the data set.
It makes a giant NumPy array of feature vectors corresponding with each image.
- mp5.py - This is the main file that starts the program, and computes the
accuracy, precision, recall, and F1-score using your implementation of the classifers.
- classify.py This is the file where you will be doing all of your work.
To understand more about how to run the MP, run python3 mp5.py -h in your terminal.
Add your code to classify.py. Do not modify the code provided
in the other files.
Inside the code ...
- The function classifyPerceptron() takes as input the training data, training labels, development set,
learning rate, and maximum number of iterations.
-
The function classifyKNN() takes as input the training data, training labels, development set,
and the number of neighbors used (k).
-
By default, you need to put the training data in the mp5-code/ folder, and the training data provided is the output from reader.py.
-
The training labels is the list of labels corresponding to each image in the training data.
-
The development set is the NumPy array of images that you are going to test your implementation on.
-
You may change the default values for parameters such as the maximum
number of iterations, the learning rate, k, etc.
However, you may not reset these values inside your classifier functions,
because some of our tests pass in specific values for these parameters.
-
You will have each classify() output the predicted labels for the development set from your models.