ECE586RS: MDPs and Reinforcement Learning

Term: Fall 2022

Prerequisites: ECE 534 (Random Processes)

Instructor: Prof. R. Srikant, rsrikant@illinois.edu

TAs: Yashaswini Murthy (ymurthy2@illinois.edu) and Anna Winnicki (annaw5@illinois.edu)

Prof. Srikant’s Office Hours: 2:20-3:00 MW, 107 CSL

TAs' Office Hours: Tue 3-5 pm, 3032 ECEB (3-4 Anna, 4-5 Yashaswini)

Lectures: 1-2:20 MW in Room 2015 ECEB

Fall Break: Nov. 19-Nov. 27

Last Day of instruction: Dec. 7

Outline (Time Permitting):

  • MDPs: Finite-horizon problems, infinite-horizon discount cost problems, Bellman equation, contraction and monotonicity properties, value and policy iteration

  • Optimization Background: gradient descent, mirror descent and stochastic gradient descent

  • Approximate Dynamic Programming: Approximate value iteration; policy evaluation using least-squares and gradient descent

  • TD Learning: Algorithms for tabular and function approximation settings, finite-time performance bounds and convergence

  • RL Methods Motivated by Value Iteration: Q-learning based on a single trajectory, with and without function approximation, offline and online versions, finite-time bounds and convergence

  • RL Methods Motivated by Policy Iteration: Policy gradient, natural policy gradient, finite-time bounds and convergence

  • Episodic RL: Q-learning over a finite-time horizon, connection to multi-armed bandits, regret bounds

Grading :

  • Homework: 80% (Homework will be posted on canvas)

  • Final Exam: 20% (7-10 pm, Dec. 14)

References:

MDPs:

  • D. P. Bertsekas. Dynamic Programming and Optimal Control, vol. I and II, Athena Scientific, 1995. (Later editions, vol. I, 2017 and vol. 2, 2012)

  • M. L. Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.

  • S. M. Ross. Applied probability models with optimization applications. Courier Corporation, 2013.

  • A concise introduction to MDPs can be found in Chapter 17 of M. Mohri, A. Rostamizadeh, and A. Talwalkar. Foundations of Machine Learning, MIT Press, 2018.

Optimization:

  • A. Beck. Introduction to nonlinear optimization: Theory, algorithms, and applications with MATLAB. SIAM, 2014.

  • A. Beck. First-order methods in optimization. SIAM, 2017.

RL :