ECE365: Fundamentals of Machine Learning (Labs and Quizzes)
Quizzes will be conducted in class, not in lab sessions.
Upload Lab assignments(just notebooks) with file name as your netid on Compass.
Lab 5
Lab 4
When calculating J*(K), you call kMeans 100 times (with niter=100) and take the minimum value of J_K as your estimate for J*(K). This is so that in case you get bad initial cluster centers, you still get a good estimate of J*(K).
This lab shouldn't take that long to run; if you're having issues with running time, chances are the hint from Lab2 on scipy.spatial.distance.cdist will help.
Note that scipy.spatial.distance.cdist returns the Euclidean distance, not its square, unless you pass 'sqeuclidean’.
Lab 3
Hints:
Read the lab directions carefully. Make sure you are not training on your test data! As stated at the top of the lab, this will be penalized heavily. If you are calling .fit() on something that doesn't have train in the name, you're doing something wrong.
In the last problem, your error in the second to last part may come out to be zero depending on which algorithm you pick. This is an (unintentional) peculiarity of this data set. So, for the last part of the last problem, just pretend that the error was something small but nonzero when writing your answer.
There are many ways to split up the data into folds in problem 2. One simple way is to make a vector with indices 0,…,N1, and remove the indices corresponding to the fold with numpy.setdiff1d, and use these to index the data. Another straightforward way is to make an array of size (4/5*N,d) and fill it in with the folds by slicing. Worst case, you can hardcode the folds and the data outside the folds.
Do not upload the data sets.
Lab 2
Hints:
If you're having trouble with broadcasting, read the help page (or search the internet for examples). Basically, dimensions have to match according to a certain set of rules (described in the links prior).
A vector (in the notes, or in math in general) is a column vector. You can't just take an equation in the notes (which takes in one feature vector and classifies it) and plug in a matrix full of data and expect it to work (the dimensions of the resulting expressions will make no sense, for one thing).
In problem 2, the prior is close in a way that might be somewhat confusing, since you should get (0.25,0.25,0.50). Just the nature of this particular training set.
In problem 3, scipy.spatial.distance.cdist can calculate out all the distances between training data and the testing data in one call.
Read the problems carefully, and make you answer each part of what needs to be done.
Lab 1
Solution: [link]
Hints:
Exercises 5 and 6 will be building blocks for the first problem in Lab 2 (where you can use part (a) or part (b) of both exercises). You should be able to do part (a) of both exercises in a straightforward manner. As stated in the lab, part (b) is optional, but good to know. If you're stuck on part (b), make sure to write out the matrices and you should be able to construct the appropriate matrix multiplication. If you do not solve part (b), do not worry about it. But, you really should solve part (a) of both Exercises 5 and 6.
A better hint for Exercise 6(b) might be: “You can do this with the np.dot, elementwise multiplication and np.sum (along an axis) operations.”
Please follow the Python instructions to get started with Jupyter notebooks. You should not need to install any additional packages for this portion of the course if you have installed Anaconda or Canopy.
The following other Python tutorials may be helpful:
And a few links to write code concisely:
