CS440 Quiz 6

CS 440/ECE 448
Fall 2025
Margaret Fleck

Quiz 6 skills list

Questions related to what you built in MP 8 and 9.

Historical names
- Vaswani et al
- BERT
- Example autoregressive LLMs: GPT, Llama, DeepSeek, Claude
- N-gram language model folks: Markov, Shannon, Jelinek, Baker
Input and output
- tokenization (e.g. Byte-pair encoding)
- convert tokens to vectors (e.g. word2vec)
- positional encoding
Recurrent neural networks
- High-level view of how they work
- When would we compute loss from last unit vs. summed over all units?
- Bidirectional RNN
- How does a "Gated RNN" differ from a standard one?
Encoder-Decoder architecture
- Basic idea
- What input to each step in decoder?
- Teacher forcing
Attention
- Weighted sum of vectors in context window
- Assessing similarity (learned weights plus dot product)
- "Attention head" (and you can have more than one)
Transformer blocks
- what's in them (high-level)
- residual connections
LLMs
- Masked vs. autoregressive
- Pre-training, fine-tuning, task head
- Training BERT
- What is BERT good for?
- Self-training autoregressive model
- Using autoregressive model
- Prompt engineering
- Some very approximate sense of the number of parameters, amount of training data, etc
Current limitations of LLMs and testing LLMs
Model collapse

Model and terminology for an MDP
Quantized representation of continuous state variables via randomized actions
Bellman equation
Methods of solving the Bellman equation
- Value iteration
- Policy iteration
- Asynchronous dynamic programming
How to choose a policy?

Basic setup for reinforcement learning (e.g. main loop)
Model-based reinforcement learning
Model-free reinforcement learning
- Q-learning version of Bellman equation (expressing Q in terms of itself, without reference to the utility or transition probability functions)
- TD update algorithm
- SARSA update algorithm
- How do TD and SARSA differ?
Selecting an action
- Deriving a policy from utility values or from Q values.
- Incorporating exploration
Online learning, offline learning, experience replay

Historical trivia and key examples

Hill-climbing

Backtracking search (DFS)

Variable assignments can be done in any order, search is to a known depth
Why does DFS work well? Why isn't looping a worry?
Heuristics for variable and value selection
- most constrained/most constraining variable
- least constraining value
- exploit any symmetries in the problem
Forward checking, constraint propagation
AC-3 algorithm
How to incorporate constraint propagation into backtracking search