ECE 598IS - Fundamental Limits in Data Science (Spring 2022)

Department of Electrical and Computer Engineering
University of Illinois at Urbana-Champaign
Lectures: Tuesday and Thursday, 9:30AM - 10:50AM, location: 4070 ECEB
Instructor: Ilan Shomorony, (Office hour upon request)


  • On 01/27 (second in-person lecture), we will meet in 4070 ECEB.
  • On 01/25 (first in-person lecture), we will meet in 3026 ECEB.
  • Information about scribing can be found here
  • The lectures in the first week (01/18 and 01/20) will be held via Zoom (Zoom info)

This course will expose students to the probabilistic modeling of several data science problems and to the characterization of fundamental performance limits (e.g., scaling laws and information-theoretic feasibility). Students will learn techniques from high-dimensional statistics, information theory, graph theory, and random matrix theory through a combination of standard lectures and reading research papers. The first part of the course will explore the use of information-theoretic tools in high-dimensional statistics. We will then focus on applications in three broad classes of data science problems: clustering, sequence reconstruction and dimensionality reduction. The mathematical models studied will be motivated by specific practical data science problems, particularly from the field of genomics.

Prerequisite: Main prerequisite is ECE 534 (Random Processes) or equivalent. ECE 563 (Information Theory) or some prior exposure to information theory thinking is recommended.

Textbook: No textbook will be required. Lectures will be based on the following sources (and on research papers):
  • Martin Wainwright, "High-Dimensional Statistics: A Non-Asymptotic Viewpoint"
  • Yihong Wu, "Information-theoretic Methods for High-dimensional Statistics" (lecture notes)
  • Yihong Wu and Jiaming Xu, "Statistical inference on graphs" (lecture notes)
  • John Duchi, “Information Theory and Statistics” (lecture notes)
  • Emanuel Abbe, “Community Detection and Stochastic Block Models”
  • Rodrigues and Eldar, “Information-Theoretic Methods in Data Science”


Part 1: Statistics and Information Theory
  • Basics of Statistical Decision Theory
  • Bayes risk and minimax risk
  • Divergence measures and inequalities
  • Minimax lower bounds (Le Cam and Fano methods)
  • Recovery of sparse vectors
Part 2: Fundamental Limits in Data Science Problems
  • Inference on graphs
    • Spectral methods and random matrix perturbations
    • The planted clique problem
    • Stochastic Block Model
  • Sequence reconstruction
    • Information-theoretic limits of sequence assembly
    • Trace reconstruction problem
  • Dimensionality reduction
    • Sparse PCA
    • Min-hashing and sketching


  • Scribing - 15%
  • Homework - 40%
  • Paper presentation - 20%
  • Class participation - 5%
  • Project - 20%