CS440 Quiz 6

CS 440/ECE 448
Fall 2021
Margaret Fleck

Quiz 6 skills list

The quiz will be on Wednesday December 1st, covering material through the lectures on constraint satisfaction problems.

It will be available on moodle 7am to noon central time, and you will have 30 minutes to do it.

Model and terminology for an MDP
Quantized representation of continuous state variables via randomized actions
Bellman equation
Methods of solving the Bellman equation
- Value iteration
- Policy iteration
- Asynchronous dynamic programming
How to choose a policy?

Basic setup for reinforcement learning (e.g. main loop)
Model-based reinforcement learning
Model-free reinforcement learning
- Q-learning version of Bellman equation (expressing Q in terms of itself, without reference to the utility or transition probability functions)
- TD update algorithm
- SARSA update algorithm
- How do TD and SARSA differ?
Selecting an action
- Deriving a policy from utility values or from Q values.
- Incorporating exploration
Online learning, offline learning, experience replay

Historical trivia:

Key examples (be familiar with basic rules)

n-queens, map/graph coloring, Sudoku, Cryptarithmetic, Waltz line labelling, task scheduling
Graph coloring is NP-complete.

Hill-climbing

Backtracking search (DFS)

Variable assignments can be done in any order, search is to a known depth
Why does DFS work well? Why isn't looping a worry?
Heuristics for variable and value selection
- most constrained/most constraining variable
- least constraining value
Forward checking, constraint propagation
AC-3 algorithm
How to incorporate constraint propagation into backtracking search