# ECE365: Fundamentals of Machine Learning (Labs and Quizzes)

• Quizzes will be conducted in class, not in lab sessions.

• Upload Lab assignments(just notebooks) with file name as your netid on Compass.

### Lab 5

• This is the last week of this section of the course. It is best to have your lab completed before the start of next lab session, so that you do not fall behind in the second part of the course.

### Lab 4

• When calculating J*(K), you call kMeans 100 times (with niter=100) and take the minimum value of J_K as your estimate for J*(K). This is so that in case you get bad initial cluster centers, you still get a good estimate of J*(K).

• This lab shouldn't take that long to run; if you're having issues with running time, chances are the hint from Lab2 on scipy.spatial.distance.cdist will help.

• Note that scipy.spatial.distance.cdist returns the Euclidean distance, not its square, unless you pass 'sqeuclidean’.

### Lab 3

Hints:

• Read the lab directions carefully. Make sure you are not training on your test data! As stated at the top of the lab, this will be penalized heavily. If you are calling .fit() on something that doesn't have train in the name, you're doing something wrong.

• In the last problem, your error in the second to last part may come out to be zero depending on which algorithm you pick. This is an (unintentional) peculiarity of this data set. So, for the last part of the last problem, just pretend that the error was something small but non-zero when writing your answer.

• There are many ways to split up the data into folds in problem 2. One simple way is to make a vector with indices 0,…,N-1, and remove the indices corresponding to the fold with numpy.setdiff1d, and use these to index the data. Another straightforward way is to make an array of size (4/5*N,d) and fill it in with the folds by slicing. Worst case, you can hardcode the folds and the data outside the folds.

• Do not upload the data sets.

### Lab 2

Hints:

• If you're having trouble with broadcasting, read the help page (or search the internet for examples). Basically, dimensions have to match according to a certain set of rules (described in the links prior).

• A vector (in the notes, or in math in general) is a column vector. You can't just take an equation in the notes (which takes in one feature vector and classifies it) and plug in a matrix full of data and expect it to work (the dimensions of the resulting expressions will make no sense, for one thing).

• In problem 2, the prior is close in a way that might be somewhat confusing, since you should get (0.25,0.25,0.50). Just the nature of this particular training set.

• In problem 3, scipy.spatial.distance.cdist can calculate out all the distances between training data and the testing data in one call.

• Read the problems carefully, and make you answer each part of what needs to be done.