CS 440/ECE 448
Fall 2021
Margaret Fleck
Quiz 6 skills list
The quiz will be on Wednesday December 1st, covering material through the lectures on constraint satisfaction
problems.
It will be available on moodle 7am to noon central time, and you will have 30 minutes to do it.
Markov Decision Processes
- Model and terminology for an MDP
- Quantized representation of continuous state variables via randomized actions
- Bellman equation
- Methods of solving the Bellman equation
- Value iteration
- Policy iteration
- Asynchronous dynamic programming
- How to choose a policy?
Reinforcement Learning
- Basic setup for reinforcement learning (e.g. main loop)
- Model-based reinforcement learning
- Model-free reinforcement learning
- Q-learning version of Bellman equation (expressing Q in terms of itself, without reference to the utility or
transition probability functions)
- TD update algorithm
- SARSA update algorithm
- How do TD and SARSA differ?
- Selecting an action
- Deriving a policy from utility values or from Q values.
- Incorporating exploration
- Online learning, offline learning, experience replay
Constraint satisfaction problems
Historical trivia:
- Waltz line labelling
- 4-color theorem
Key examples (be familiar with basic rules)
- n-queens, map/graph coloring, Sudoku, Cryptarithmetic, Waltz line labelling, task scheduling
- Graph coloring is NP-complete.
Hill-climbing
- how it works (high-level idea only)
- how it differs from backtracking search
Backtracking search (DFS)
- Variable assignments can be done in any order, search is to a known depth
- Why does DFS work well? Why isn't looping a worry?
- Heuristics for variable and value selection
- most constrained/most constraining variable
- least constraining value
- Forward checking, constraint propagation
- AC-3 algorithm
- How to incorporate constraint propagation into backtracking search