CS440 Quiz 5

CS 440/ECE 448
Fall 2021
Margaret Fleck

Quiz 5 skills list

The quiz will be on Wednesday November 10th covering material through Vector Semantics

It will be available on moodle 7am to noon central time, and you will have 30 minutes to do it.

Historical figures

Briefly relate the following people to classification

William Labov
Eve Clark
J. R. Firth

No Calculus details

You will not need to remember/calculate derivatives for specific functions, or composed sets of functions. I want you to know the high-level picture of what derivative computations you're asking a tool (e.g. pytorch) to do for you.

Neural Nets

Design (i.e. connected set of linear classifiers)
- Hidden Layers
- When we have multiple layers, why should there be a (non-linear) activation function between them?
- What are the advantages of deep networks?
What kinds of functions can a neural net approximate?
Training
- Top-level (aka simple) update equation for a single weight in the network
- What does backpropagation compute? High-level picture of how it works. E.g. where do we use the chain rule? Why do we need the forward values?
- Symmetry breaking
- Data augmentation
- Dropout
- Overfitting
- Vanishing/exploding gradients
- Weight regularization
- Leaky ReLu
- Why must we initialize weights to random values rather than zero? (You don't need to give names or details for specific initialization methods.)
Convolutional neural networks
- What is convolution?
- How does a convolutional layer work?
- In what situations would we want a convolutional layer vs. a fully-connected layer?
- Depth and stride
- What is a pooling layer? Why would max pooling be useful.
- Overall architecture, e.g. what kinds of features are detected in early vs. late layers?
- Weight/parameter sharing
Generative adversarial neural network
Adversarial examples
- in computer vision
- in natural language
Recurrent neural networks
- High-level view of how they work
- When would we compute loss from last unit vs. summed over all units?
- Bidirectional RNN
- How does a "Gated RNN" differ from a standard one?

Vector Semantics (Word Embeddings)

Logic-based vs. context-based representations of meaning
Principle of contrast
Vector representations for words
- one-hot representations for words
- word embeddings/feature vectors
- cosine/dot product similarity
- how do they model analogies, compositional semantics?
- how to evaluate them
Building feature vectors
- relating words to documents
- relating words to words
Normalization and smoothing
- What's wrong with the raw vectors?
- TF-IDF (don't memorize the exact equation!)
- PMI and PPMI
- When would a PMI value be negative? When would negative values be reliable vs just noise?
- Singular value decomposition (Principal Components Analysis)
Word2vec
- Main outline of algorithm
- Why are words and contexts embedded separately?
- Negative sampling
- Sigmoid function (definition, relating probabilities of good and bad)
Word2vec details
- Uses more negative examples than positive ones
- Raising context counts to a power
- Weighting context words by distance from focus word
- Deleting rare words, subsampling frequent ones