CS 440/ECE 448
Fall 2021
Margaret Fleck
Quiz 5 skills list
The quiz will be on Wednesday November 10th covering material through Vector Semantics
It will be available on moodle 7am
to noon central time, and you will have 30 minutes to do it.
Historical figures
Briefly relate
the following people to classification
- William Labov
- Eve Clark
- J. R. Firth
No Calculus details
You will not need to remember/calculate derivatives
for specific functions, or composed sets of functions.
I want you to know the high-level picture of what derivative
computations you're asking a tool (e.g. pytorch) to do for you.
Neural Nets
- Design (i.e. connected set of linear classifiers)
- Hidden Layers
- When we have multiple layers, why should there be a (non-linear) activation
function between them?
- What are the advantages of deep networks?
- What kinds of functions can a neural net approximate?
- Training
- Top-level (aka simple) update equation for a single weight in the network
- What does backpropagation compute? High-level picture of how it works.
E.g. where do we use the chain rule? Why do we need the forward values?
- Symmetry breaking
- Data augmentation
- Dropout
- Overfitting
- Vanishing/exploding gradients
- Weight regularization
- Leaky ReLu
- Why must we initialize weights to random values rather than zero? (You don't need to
give names or details for specific initialization methods.)
- Convolutional neural networks
- What is convolution?
- How does a convolutional layer work?
- In what situations would we want a convolutional layer vs. a fully-connected layer?
- Depth and stride
- What is a pooling layer? Why would max pooling be useful.
- Overall architecture, e.g.
what kinds of features are detected in early vs. late layers?
- Weight/parameter sharing
- Generative adversarial neural network
- Adversarial examples
- in computer vision
- in natural language
- Recurrent neural networks
- High-level view of how they work
- When would we compute loss from last unit vs. summed over all units?
- Bidirectional RNN
- How does a "Gated RNN" differ from a standard one?
Vector Semantics (Word Embeddings)
- Logic-based vs. context-based representations of meaning
- Principle of contrast
- Vector representations for words
- one-hot representations for words
- word embeddings/feature vectors
- cosine/dot product similarity
- how do they model analogies, compositional semantics?
- how to evaluate them
- Building feature vectors
- relating words to documents
- relating words to words
- Normalization and smoothing
- What's wrong with the raw vectors?
- TF-IDF (don't memorize the exact equation!)
- PMI and PPMI
- When would a PMI value be negative? When would negative values be reliable vs just noise?
- Singular value decomposition (Principal Components Analysis)
- Word2vec
- Main outline of algorithm
- Why are words and contexts embedded separately?
- Negative sampling
- Sigmoid function (definition, relating probabilities of good and bad)
- Word2vec details
- Uses more negative examples than positive ones
- Raising context counts to a power
- Weighting context words by distance from focus word
- Deleting rare words, subsampling frequent ones