CS 440/ECE 448
Fall 2018
Margaret Fleck
Midterm 2 skills list
You're expected to still remember material from the first midterm,
but we'll focus on new concepts. Also expect new questions on
topics on which you've done an MP since the last exam.
Testing
- Roles of training, development, test datasets.
- Evaluation metrics for classification (true positive rate, accuracy, recall, ...)
- Overfitting
Modelling text data
- Data cleaning: tokenization, stemming, stop word removal, etc
- Bigrams, ngrams
- Word types vs. tokens
Naive Bayes
- Comparing size of model to full joint probability table
- Estimating probabilities from data
- Avoiding underflow
- Smoothing
- Why is it important?
- Laplace smoothing
- Deleted estimation
- Ngram smoothing (high level ideas only)
Bayes nets
- Required structure (e.g. it's a DAG, probability table at each node)
- How does geometry relate to conditional independence assumptions
- What do local node connections imply about independence? conditional indepedence?
- Comparing size of model to full joint probability table
- Building a Bayes net from a joint probability distribution
- How is this affected by the order in which we choose variables?
- Inference, aka calculating specific probability values using the geometry and tables in a
Bayes net (very simple examples only)
- Efficiency results for inference
- Efficient for polytrees
- Some other geometries can be solved efficiently via junction tree algorithm
- NP complete (aka probably exponential) in general case
Natural Language
Shallow knowledge only, but be familiar with the vocabulary.
- Some sample tasks (e.g. translation, question answering, ..)
- Speech: waveform, spectrogram, formants, phones
- Word segmentation (e.g. suffixes)
- Part of speech tagging
- Parsing
- Semantics (e.g. semantic role labelling, sentiment analysis, ...)
POS tagging
- General familiarity with common POS tags (e.g. Noun, Determiner)
- Approximate size of typical POS tag sets
- Single word with multiple possible tags
- Baseline tagging algorithm
HMMs
- Markov assumptions
- k-word context can be simulated using 1-word context
- Graphical model picture, what variables depend on which other ones in the HMM
- Component probabilities (initial, emission, transition)
- Equations for computing probability of a given tag sequence
- Tag transition possibilities as a finite-state automaton
- Viterbi (trells) decoding algorithm
Computer vision
Shallow knowledge only, but be familiar with the vocabulary.
See Russell and Norvig due to lack of lecture slides.
- Pinhole camera
- Pixels
- Edge detection, segmentation
- Texture, shading
- Reconstructing 3D from multiple 2D views of a scene
- Localizing and identifying/naming objects in a picture
- Adversarial example (google it if you don't remember the stop sign example from
lecture)
Classifiers
- Types of supervision (supervised, unsupervised, semi-supervised, self-supervised)
- Batch vs. incremental training
- Direct vs. indirect feedback, immediate vs. delayed feedback
- Nearest neighbor classifiers
- Decision trees, random forests
- Entropy: definition, how it relates to evaluating possible splits in a decision tree
- k-nearest neighbors
- L1 vs. L2 norm
Linear Classifiers
Perceptrons
- Basics of how perceptrons work
- Overall training algorithm (e.g. epochs, random processing order)
- Know rule for updating perceptron weights
- Limitations of perceptrons and ways to address them
- Multi-class perceptrons
Other linear classifiers
- Sample activation functions (esp. logistic/sigmoid, ReLU)
- Sample loss functions (e.g. L1, L2 norm)
- What are we minimizing when we adjust the weights?
(composition of weighted feature sum, activation function, loss function)
- Adjusting weights for differentiable units using
gradient descent
- You will not need to reproduce or
rederive the equation for updating weights.
- Regularization