Course Description:
This course rigorously covers foundational concepts in learning theory, emphasizing theoretical analysis
  related to modern deep learning frameworks. Key topics include generalization analysis, VC-dimension,
  covering  numbers,  Rademacher  complexity,  stochastic  gradient  descent,  common  techniques  for  lower
  bound  analysis,  universal  approximation  results,  Neural  Tangent  Kernel  (NTK)  regime  optimization,
  benign overfitting, and mean-field analysis. Evaluation consists of four homework assignments, a take-home
  midterm exam, and a group final project. The course aims to equip students with the theoretical foundations
  necessary to engage with current research literature.
  
Prerequisites:
Advanced linear algebra
Probability and statistics
Machine learning at the level of CS446
Strong mathematical skills at the level of mathematical statistics and real analysis.
Class time: 
Mon, Wed, 11:00am–12:15pm, SC 0216. 
Lectures will be conducted in person and will not be recorded unless otherwise announced. 
  
Instructor: Prof. Tong Zhang (tozhang@illinois.edu)
Office: SC 2118
Office Hour: Mon 10:00am – 10:50am
Course resources:
Grading:
Four theoretical homework assignments (60%)
One take-home midterm (20%), with a 3-day turnaround time
One group final project (15%), with 3–4 students per group
Read and understand a recent paper related to learning theory. Give a 20–25 minute presentation (5%) and write an approximately five-page report (10%).
Class attendance via online sign-in (5%).
Course Material:
Lecture slides (distributed before each lecture)
Reference book: Mathematical Analysis of Machine Learning Algorithms
Paper readings
Lectures (tentative):
Introduction (1 lecture)
Probability inequalities (2 lectures)
Uniform convergence (2 lectures)
Covering numbers (2 lectures)
VC dimension (2 lectures)
Rademacher complexity (2 lectures)
Concentration inequality (1 lecture)
Model selection (1 lecture)
Lower bounds (2 lectures)
SGD analysis (2 lectures)
Universal approximation (2 lectures)
Neural tangent kernel (2 lectures)
Benign overfitting (2 lectures)
Mean-field analysis (2 lectures)
Presentations (4 lectures)