CS 440/ECE 448
Fall 2024
Margaret Fleck
Quiz 5 skills list
The quiz will cover material through Vector Semantics
MP practicalities
Questions related to what you built in MP 8 and 9.
Historical figures
Briefly explain who/what these people or algorithms are, relate them to AI.
- Eve Clark
- J. R. Firth
- Yann Le Cun
- Tomas Mikolov
No Calculus details
You will not need to remember/calculate derivatives
for specific functions, or composed sets of functions.
I want you to know the high-level picture of what derivative
computations you're asking a tool (e.g. pytorch) to do for you.
Neural Nets
- Design (i.e. connected set of linear classifiers)
- Hidden Layers
- When we have multiple layers, why should there be a (non-linear) activation
function between them?
- What are the advantages of deep networks?
- What kinds of functions can a neural net approximate?
- Training
- Top-level (aka simple) update equation for a single weight in the network
- What does backpropagation compute? High-level picture of how it works.
E.g. where do we use the chain rule? Why do we need the forward values?
- Symmetry breaking
- Data augmentation
- Regularization
- Dropout
- Overfitting
- Vanishing/exploding gradients
- Weight regularization
- Leaky ReLu
- Why must we initialize weights to random values rather than zero? (You don't need to
give names or details for specific initialization methods.)
- epochs, (mini-)batches
- Convolutional neural networks
- What is convolution?
- How does a convolutional layer work?
- In what situations would we want a convolutional layer vs. a fully-connected layer?
- Depth and stride
- What is a pooling layer? Why would max pooling be useful.
- Overall architecture, e.g.
what kinds of features are detected in early vs. late layers?
- Weight/parameter sharing
- Generative adversarial neural network
- Adversarial examples
- in computer vision
- in natural language
Vector Semantics (Word Embeddings)
- Logic-based vs. context-based representations of meaning
- Principle of contrast
- Vector representations for words
- one-hot representations for words
- word embeddings/feature vectors
- cosine/dot product similarity
- how do they model analogies, compositional semantics?
- how to evaluate them
- Building feature vectors
- relating words to documents
- relating words to words
- Normalization and smoothing
- What's wrong with the raw vectors?
- PMI and PPMI
- When would a PMI value be negative? When would negative values be reliable vs just noise?
- Singular value decomposition (Principal Components Analysis)
- Word2vec
- Main outline of algorithm
- Why are words and contexts embedded separately?
- Negative sampling
- Sigmoid function (definition, relating probabilities of good and bad)
- Word2vec details
- Uses more negative examples than positive ones
- Raising context counts to a power
- Weighting context words by distance from focus word
- Deleting rare words, subsampling frequent ones