CS440 Midterm 2

CS 440/ECE 448
Fall 2018
Margaret Fleck

Midterm 2 skills list

You're expected to still remember material from the first midterm, but we'll focus on new concepts. Also expect new questions on topics on which you've done an MP since the last exam.

Testing

Roles of training, development, test datasets.
Evaluation metrics for classification (true positive rate, accuracy, recall, ...)
Overfitting

Modelling text data

Data cleaning: tokenization, stemming, stop word removal, etc
Bigrams, ngrams
Word types vs. tokens

Naive Bayes

Comparing size of model to full joint probability table
Estimating probabilities from data
Avoiding underflow
Smoothing
- Why is it important?
- Laplace smoothing
- Deleted estimation
- Ngram smoothing (high level ideas only)

Bayes nets

Required structure (e.g. it's a DAG, probability table at each node)
How does geometry relate to conditional independence assumptions
What do local node connections imply about independence? conditional indepedence?
- Explaining away
Comparing size of model to full joint probability table
Building a Bayes net from a joint probability distribution
- How is this affected by the order in which we choose variables?
Inference, aka calculating specific probability values using the geometry and tables in a Bayes net (very simple examples only)
Efficiency results for inference
- Efficient for polytrees
- Some other geometries can be solved efficiently via junction tree algorithm
- NP complete (aka probably exponential) in general case

Natural Language

Shallow knowledge only, but be familiar with the vocabulary.

Some sample tasks (e.g. translation, question answering, ..)
Speech: waveform, spectrogram, formants, phones
Word segmentation (e.g. suffixes)
Part of speech tagging
Parsing
Semantics (e.g. semantic role labelling, sentiment analysis, ...)

POS tagging

General familiarity with common POS tags (e.g. Noun, Determiner)
Approximate size of typical POS tag sets
Single word with multiple possible tags
Baseline tagging algorithm

HMMs

Markov assumptions
k-word context can be simulated using 1-word context
Graphical model picture, what variables depend on which other ones in the HMM
Component probabilities (initial, emission, transition)
Equations for computing probability of a given tag sequence
Tag transition possibilities as a finite-state automaton
Viterbi (trells) decoding algorithm

Computer vision

Shallow knowledge only, but be familiar with the vocabulary. See Russell and Norvig due to lack of lecture slides.

Pinhole camera
Pixels
Edge detection, segmentation
Texture, shading
Reconstructing 3D from multiple 2D views of a scene
Localizing and identifying/naming objects in a picture
Adversarial example (google it if you don't remember the stop sign example from lecture)

Classifiers

Types of supervision (supervised, unsupervised, semi-supervised, self-supervised)
Batch vs. incremental training
Direct vs. indirect feedback, immediate vs. delayed feedback
Nearest neighbor classifiers
Decision trees, random forests
Entropy: definition, how it relates to evaluating possible splits in a decision tree
k-nearest neighbors
L1 vs. L2 norm

Linear Classifiers

Perceptrons

Basics of how perceptrons work
Overall training algorithm (e.g. epochs, random processing order)
Know rule for updating perceptron weights
Limitations of perceptrons and ways to address them
Multi-class perceptrons

Other linear classifiers

Sample activation functions (esp. logistic/sigmoid, ReLU)
Sample loss functions (e.g. L1, L2 norm)
What are we minimizing when we adjust the weights? (composition of weighted feature sum, activation function, loss function)
Adjusting weights for differentiable units using gradient descent
You will not need to reproduce or rederive the equation for updating weights.
Regularization