ECE 598IS - Fundamental Limits in Data Science (Spring 2022)
Department of Electrical and Computer Engineering
University of Illinois at Urbana-Champaign
Lectures: Tuesday and Thursday, 9:30AM - 10:50AM, location: 4070 ECEB
Instructor: Ilan Shomorony, ilans@illinois.edu. (Office hour upon request)
Announcements
- On 01/27 (second in-person lecture), we will meet in 4070 ECEB.
- On 01/25 (first in-person lecture), we will meet in 3026 ECEB.
- Information about scribing can be found here
- The lectures in the first week (01/18 and 01/20) will be held via Zoom (Zoom info)
Summary:
This course will expose students to the probabilistic modeling of several data science problems and to the characterization of fundamental performance limits (e.g., scaling laws and information-theoretic feasibility). Students will learn techniques from high-dimensional statistics, information theory, graph theory, and random matrix theory through a combination of standard lectures and reading research papers. The first part of the course will explore the use of information-theoretic tools in high-dimensional statistics. We will then focus on applications in three broad classes of data science problems: clustering, sequence reconstruction and dimensionality reduction. The mathematical models studied will be motivated by specific practical data science problems, particularly from the field of genomics.
Prerequisite: Main prerequisite is ECE 534 (Random Processes) or equivalent. ECE 563 (Information Theory) or some prior exposure to information theory thinking is recommended.
Textbook: No textbook will be required. Lectures will be based on the following sources (and on research papers):
- Martin Wainwright, "High-Dimensional Statistics: A Non-Asymptotic Viewpoint"
- Yihong Wu, "Information-theoretic Methods for High-dimensional Statistics" (lecture notes)
- Yihong Wu and Jiaming Xu, "Statistical inference on graphs" (lecture notes)
- John Duchi, “Information Theory and Statistics” (lecture notes)
- Emanuel Abbe, “Community Detection and Stochastic Block Models”
- Rodrigues and Eldar, “Information-Theoretic Methods in Data Science”
Topics:
Part 1: Statistics and Information Theory
- Basics of Statistical Decision Theory
- Bayes risk and minimax risk
- Divergence measures and inequalities
- Minimax lower bounds (Le Cam and Fano methods)
- Recovery of sparse vectors
Part 2: Fundamental Limits in Data Science Problems
- Inference on graphs
- Spectral methods and random matrix perturbations
- The planted clique problem
- Stochastic Block Model
- Sequence reconstruction
- Information-theoretic limits of sequence assembly
- Trace reconstruction problem
- Dimensionality reduction
- Sparse PCA
- Min-hashing and sketching
Grading
- Scribing - 15%
- Homework - 40%
- Paper presentation - 20%
- Class participation - 5%
- Project - 20%
|