CS440 Quiz 5

CS 440/ECE 448
Fall 2023
Margaret Fleck

Quiz 5 skills list

The quiz will cover material through Vector Semantics

Historical figures

Briefly explain who/what these people or algorithms are, relate them to AI.

Eve Clark
J. R. Firth
Yann Le Cun
Tomas Mikolov

No Calculus details

You will not need to remember/calculate derivatives for specific functions, or composed sets of functions. I want you to know the high-level picture of what derivative computations you're asking a tool (e.g. pytorch) to do for you.

Neural Nets

Design (i.e. connected set of linear classifiers)
- Hidden Layers
- When we have multiple layers, why should there be a (non-linear) activation function between them?
- What are the advantages of deep networks?
What kinds of functions can a neural net approximate?
Training
- Top-level (aka simple) update equation for a single weight in the network
- What does backpropagation compute? High-level picture of how it works. E.g. where do we use the chain rule? Why do we need the forward values?
- Symmetry breaking
- Data augmentation
- Regularization
- Dropout
- Overfitting
- Vanishing/exploding gradients
- Weight regularization
- Leaky ReLu
- Why must we initialize weights to random values rather than zero? (You don't need to give names or details for specific initialization methods.)
- epochs, (mini-)batches
Convolutional neural networks
- What is convolution?
- How does a convolutional layer work?
- In what situations would we want a convolutional layer vs. a fully-connected layer?
- Depth and stride
- What is a pooling layer? Why would max pooling be useful.
- Overall architecture, e.g. what kinds of features are detected in early vs. late layers?
- Weight/parameter sharing
Generative adversarial neural network
Adversarial examples
- in computer vision
- in natural language
Recurrent neural networks
- High-level view of how they work
- When would we compute loss from last unit vs. summed over all units?
- Bidirectional RNN
- How does a "Gated RNN" differ from a standard one?

Vector Semantics (Word Embeddings)

Logic-based vs. context-based representations of meaning
Principle of contrast
Vector representations for words
- one-hot representations for words
- word embeddings/feature vectors
- cosine/dot product similarity
- how do they model analogies, compositional semantics?
- how to evaluate them
Building feature vectors
- relating words to documents
- relating words to words
Normalization and smoothing
- What's wrong with the raw vectors?
- TF-IDF (don't memorize the exact equation!)
- PMI and PPMI
- When would a PMI value be negative? When would negative values be reliable vs just noise?
- Singular value decomposition (Principal Components Analysis)
Word2vec
- Main outline of algorithm
- Why are words and contexts embedded separately?
- Negative sampling
- Sigmoid function (definition, relating probabilities of good and bad)
Word2vec details
- Uses more negative examples than positive ones
- Raising context counts to a power
- Weighting context words by distance from focus word
- Deleting rare words, subsampling frequent ones