ECE 598IS  Fundamental Limits in Data Science (Spring 2022)
Department of Electrical and Computer Engineering
University of Illinois at UrbanaChampaign
Lectures: Tuesday and Thursday, 9:30AM  10:50AM, location: 4070 ECEB
Instructor: Ilan Shomorony, ilans@illinois.edu. (Office hour upon request)
Announcements
 On 01/27 (second inperson lecture), we will meet in 4070 ECEB.
 On 01/25 (first inperson lecture), we will meet in 3026 ECEB.
 Information about scribing can be found here
 The lectures in the first week (01/18 and 01/20) will be held via Zoom (Zoom info)
Summary:
This course will expose students to the probabilistic modeling of several data science problems and to the characterization of fundamental performance limits (e.g., scaling laws and informationtheoretic feasibility). Students will learn techniques from highdimensional statistics, information theory, graph theory, and random matrix theory through a combination of standard lectures and reading research papers. The first part of the course will explore the use of informationtheoretic tools in highdimensional statistics. We will then focus on applications in three broad classes of data science problems: clustering, sequence reconstruction and dimensionality reduction. The mathematical models studied will be motivated by specific practical data science problems, particularly from the field of genomics.
Prerequisite: Main prerequisite is ECE 534 (Random Processes) or equivalent. ECE 563 (Information Theory) or some prior exposure to information theory thinking is recommended.
Textbook: No textbook will be required. Lectures will be based on the following sources (and on research papers):
 Martin Wainwright, "HighDimensional Statistics: A NonAsymptotic Viewpoint"
 Yihong Wu, "Informationtheoretic Methods for Highdimensional Statistics" (lecture notes)
 Yihong Wu and Jiaming Xu, "Statistical inference on graphs" (lecture notes)
 John Duchi, “Information Theory and Statistics” (lecture notes)
 Emanuel Abbe, “Community Detection and Stochastic Block Models”
 Rodrigues and Eldar, “InformationTheoretic Methods in Data Science”
Topics:
Part 1: Statistics and Information Theory
 Basics of Statistical Decision Theory
 Bayes risk and minimax risk
 Divergence measures and inequalities
 Minimax lower bounds (Le Cam and Fano methods)
 Recovery of sparse vectors
Part 2: Fundamental Limits in Data Science Problems
 Inference on graphs
 Spectral methods and random matrix perturbations
 The planted clique problem
 Stochastic Block Model
 Sequence reconstruction
 Informationtheoretic limits of sequence assembly
 Trace reconstruction problem
 Dimensionality reduction
 Sparse PCA
 Minhashing and sketching
Grading
 Scribing  15%
 Homework  40%
 Paper presentation  20%
 Class participation  5%
 Project  20%
