CS 440/ECE 448
Fall 2021
Margaret Fleck
Quiz 4 skills list
The quiz will be on Wednesday October 27th, covering material through Linear Classifiers
It will be available on moodle 7am to noon central time, and you will have 30 minutes to do it.
Classifier Overview
General design
- Uses for classification: labelling objects, making decisions
- Multi-layer systems
- What can we tune?
- parameters (e.g. weights)
- hyper-parameters (e.g. tuning constants)
- design, network topology
- Challenges with determining the correct answer
- how specific/general should the class label be?
- unfamiliar objects, unfamiliar words
- context may affect best label to choose
- deciding what's important in complex scenes, extended sentences
- Data for supervised training
- "gold" answers
- Noise in "correct" answers/annotation
- Annotators with limited training
- Data scraped off the web
- Data available only for final output of system
- Workarounds of limited training data
- Re-purposing layers trained for another purpose
- Creating training pairs by removing information
- Self-supervised, semi-supervised, unsupervised methods
- Batch vs. incremental training
K-nn and Decision Trees
Specific techniques
- k-nearest neighbors (how it works, what happens if you change k)
- L1 vs. L2 norm
- Decision trees, random forests
- Entropy: definition, how it relates to evaluating possible splits in a decision tree
Perceptrons
- "Linearly separable"
- Basics of how perceptrons work
- Replacing bias with an extra weight
- Overall training algorithm (e.g. epochs, random processing order)
- Rule for updating perceptron weights
- Limitations of perceptrons and ways to address them
- Multi-class perceptrons
- Comparison to Naive Bayes
Linear Classifiers
- Sample activation functions. Know the equations for sigmoid and ReLU.
- Sample loss functions (e.g. 0/1, L1, L2, cross-entropy)
- What are we minimizing when we adjust the weights?
(composition of weighted feature sum, activation function, loss function)
- Adjusting weights for a differentiable unit using
gradient descent.
- Main update equation (not details of all the derivatives)
- Why do we need activation and loss functions differentiable?
- One-hot representations
- Softmax
- Regularization