CS 540: Deep Learning Theory

Fall 2025

Course Description:
This course rigorously covers foundational concepts in learning theory, emphasizing theoretical analysis related to modern deep learning frameworks. Key topics include generalization analysis, VC-dimension, covering numbers, Rademacher complexity, stochastic gradient descent, common techniques for lower bound analysis, universal approximation results, Neural Tangent Kernel (NTK) regime optimization, benign overfitting, and mean-field analysis. Evaluation consists of four homework assignments, a take-home midterm exam, and a group final project. The course aims to equip students with the theoretical foundations necessary to engage with current research literature.
Prerequisites:
- Advanced linear algebra
- Probability and statistics
- Machine learning at the level of CS446
- Strong mathematical skills at the level of mathematical statistics and real analysis.
Class time:
Mon, Wed, 11:00am–12:15pm, SC 0216.
Lectures will be conducted in person and will not be recorded unless otherwise announced.
Instructor: Prof. Tong Zhang (tozhang@illinois.edu)
- Office: SC 2118
- Office Hour: Mon 10:00am – 10:50am
Course resources:
- Website: https://courses.grainger.illinois.edu/cs540/fa2025/
- Canvas: https://canvas.illinois.edu/courses/60407
Grading:
- Four theoretical homework assignments (60%)
- One take-home midterm (20%), with a 3-day turnaround time
- One group final project (15%), with 3–4 students per group
  - Read and understand a recent paper related to learning theory. Give a 20–25 minute presentation (5%) and write an approximately five-page report (10%).
- Class attendance via online sign-in (5%).
Course Material:
- Lecture slides (distributed before each lecture)
- Reference book: Mathematical Analysis of Machine Learning Algorithms
- Paper readings
Lectures (tentative):
- Introduction (1 lecture)
- Probability inequalities (2 lectures)
- Uniform convergence (2 lectures)
- Covering numbers (2 lectures)
- VC dimension (2 lectures)
- Rademacher complexity (2 lectures)
- Concentration inequality (1 lecture)
- Model selection (1 lecture)
- Lower bounds (2 lectures)
- SGD analysis (2 lectures)
- Universal approximation (2 lectures)
- Neural tangent kernel (2 lectures)
- Benign overfitting (2 lectures)
- Mean-field analysis (2 lectures)
- Presentations (4 lectures)