UIUC

CS 598: Algorithms for Big DataChandra Chekuri


This course will describe some algorithmic techniques developed for
handling large amounts of data that is often available in limited
ways. Topics that will be covered include data stream algorithms,
sampling and sketching techniques, and sparsification, with
applications to signals, matrices, and graphs. Emphasis will be on the
theoretical aspects of the design and analysis of such algorithms.
Prerequisites: CS 573, good background in (discrete) probability
Instructor: Chandra Chekuri 3228
Office Hours: Wed 1011am, and by appointment
Grading Policy: There will be 45 homeworks, roughly once every two weeks and final course project. Course projects could involve research on a specific problem or topic, a survey of several papers on a topic (summarized in a report and/or talk), or an experimental evaluation. I also expect students to scribe one lecture in latex.
Prerequisites: This is a graduate level class and a reasonable background in algorithms and discrete mathematics would be needed. Knowledge and exposure to probability and linear algebra is necessary.
Reference/Study material:
Potential Topics:
Note: The above list is suggestive/tentative and we will cover only a subset of the topics.
Homework:
Homework 1 given on Thursday 09/2/14, due in class on Thursday, 9/11/14.
Homework 2 given on Friday 09/12/14, due on Thursday, 9/23/14.
Homework 3 given on Friday 10/10/14, due on Thursday, 10/23/14.
Homework 4 given on Friday 11/7/14, due on Thursday, 11/20/14.
Lectures:
Sample LaTeX file and algo.sty
Warning: Notes may contain errors. Please bring those to the attention of the instructor.
Lecture 1: 8/26/14, Introduction, basics of probability, probabilistic counting (Morris's algorithm), reservoir sampling
Lecture 2: 8/28/14, Estimating Number of Distinct Elements in a Stream
Lecture 3: 9/2/14, Estimating F_k norms via AMS sampling
Lecture 4: 9/4/14, Estimating F_2 norm, Sketching, JohnsonLindenstrauss Lemma
Lecture 5: 9/9/14, Estimating F_p norm for 0 < p < 2, MisraGreis algorithm for frequent items
Lecture 6: 9/11/14, Count and CountMin Sketches
Lecture 7: 9/16/14, Sparse recovery via CountSketch (see notes from previous lecture)
Lecture 8: 9/18/14, \ell_2 sampling and application to nearoptimal F_k estimation for k > 2
Lecture 9: 9/23/14, \ell_0 sampling, and priority sampling
Lecture 10: 9/25/14, Quantiles and selection in multiple passes
Lecture 11: 9/30/14, Continue previous lecture.
No lecture on 10/2/14, discuss home work problems.
Lecture 12: 10/07/14, Graph Streams: Connectivity, Cut/Spectral sparsifiers
Lecture 13: 10/09/14, Graph Streams: Spanners, Matchings, Sketching for graphs
Lecture 14: 10/14/14, Finish graph streams. (2+\eps) for kcenter clustering in streaming setting
Lecture 15: 10/16/14, Lower bounds for streaming via communication complexity
Lecture 16: 10/21/14, Lower bound on communication complexity of INDEX, lower bounds for graph streaming problems
Lecture 17: 10/23/14, Similarity estimation and Locality sensitive hashing
Lecture 18: 10/28/14, Approximate Nearest Neighbor Search via Locality Sensitive Hashing
Lecture 19: 10/30/14, Approximate matrix multiplication.
Lecture 20: 11/4/14, Singular Value Decomposition
Lecture 21: 11/6/14, Fast Deterministic LowRank Approximation
Lecture 22: 11/18/14, Subspace embeddings and Fast Least Squares Regression
Lecture 23: 11/20/14, Compressed Sensing
Course Project Information