**Lectures:** Tue/Thu 9.30-10.45am CST, 0216 Siebel Center

**Tuesday, August 23.**Introduction, logistics, streaming and sampling**Thursday, August 25.**Randomized algorithms: quick sort**Tuesday, August 30.**Probabilistic inequalities and median estimation- Slides
- Sariel Har-Peled's notes on randomized algorithms, in particular Chapter 7 on Chernoff inequality and a cheat sheet
- Randomized algorithms books and references below

**Thursday, Sept 1.**Probabilistic counting (Morris counter)- Slides
- Section 2.1 in Cormode and Ye's book.
- Chapter 2 in Jelani Nelson's lecture notes .
- Chapter 4 in Amit Chakrabarti's notes.
- Lecture 1 from Fall 2014
- Counting large numbers of events in small registers by Morris, CACM 1978
- Approximate counting: a detailed analysis by Flajolet.

**Tuesday, Sept 6.**Intro to frequency moments in streaming, Distinct element estimation- Slides
- Chapter 2 in Jelani Nelson's lecture notes.
- Section 2.5 and 2.6 in Cormode and Ye's book.
- Chapters 2 and 3 in Amit Chakrabarti's notes.
- Lecture 2 from Fall 2014
- Original Flajolet-Martin paper based on hashing
- Hyperlolog in practice
- A theoretically optimal algorithm in terms of space

**Thursday, Sept 8.**Limited independence and Hashing, Distinct elements via pairwise independent hashing- Slides for hashing, slides for distinct elements
- Avrim Blum's notes
- Salil Vadhan's book on pseudorandomness (Section 3.5 on pairwise independence)
- Mikkel Thorup's article on hashing
- A nice introduction to finite fields, and notes
- Fast hashing with Strong Concentration Bounds, STOC 2020 paper motivated by streaming applications.

**Tuesday, Sept 13.**AMS estimator for frequency moments, F2 estimation**Thursday, Sept 15.**Heavy hitters and deterministic Misra-Greis algorithm**Tuesday, Sept 20.**Count-Min and Count Sketches**Thursday, Sept 22.**Sketching for range queries, Sparse recovery- Slides
- Lecture 6 from Fall 2014
- Chapter 9 in Amit Chakrabarti's notes.
- Chapter 3 in Jelani Nelson's lecture notes.
- Chapter 3 in Cormode and Ye's book.

**Tuesday, Sept 27.**No lecture**Thursday, Sept 29.**JL Lemma and Dimensionality Reduction, Subspace Embeddings**Tuesday, Oct 4.**Finish Subspace Embeddings, Application to Least Squares Regression- Slides
- Chapter 6 in Jelani Nelson's lecture notes
- Chapter 6 in Cormode and Ye's book.

**Thursday, Oct 6.**No lecture in lieu of midterm exam**Tuesday, Oct 11.**Similarity estimation- Slides
- Chapter 3 "Finding Similar Items" in MMDS book.
- Moses Charikar's paper Similarity estimation techniques from rounding algorithms
- Min-Wise Independent Permutations by Broder, Charikar, Frieze, Mitzenmacher.
- Piotr Indyk's paper on small min-wise hash families

**Thursday, Oct 13.**Locality Sensitive Hashing- Slides
- LSH webpage including downloadable code and papers and slides
- Sasho Nikolov's notes on LSH
- Survey Approximate Nearest Neighbor Search in High Dimensions by Andoni, Indyk and Razenshteyn
- Sariel Har-Peled's book on Geometric Approximation Algorithms covers ANN in both low and high-dimensions

**Tuesday, Oct 18.**LSH for Euclidean distances**Thursday, Oct 20.**Quantiles and Selection- Slides
- Kent Quanrud's notes from Spring 2019
- Lecture 10 from 2014
- Survey by Greenwald and Khanna on Quantiles
- Application of weighted quantiles to an ML system. See paper of Chen, Guestrin.

**Tuesday, Oct 25.**Finish previous lecture. Selection in Random Order Streams**Thursday, Oct 27.**Approximate Matrix Multiplication- Slides
- Lecture 11 from Indyk-Nelson course
- Mahoney's lecture notes

**Tuesday, Nov 1.**SVD and low-rank approximation- Slides
- Chapter 3 in Foundations of Data Science (online draft)
- Lecture notes on understanding PCA and SVD from Stanford course

**Thursday, Nov 3.**Finish last lecture, Fast/approximate NLA- Slides
- Chapter 6 in Jelani Nelson's lecture notes
- Low Rank Approximation and Regression in Input Sparsity Time by Clarkson and Woodruff.
- Woodruff's manuscript on sketching for NLA.
- Low-distortion Subspace Embeddings in Input-sparsity Time and Applications to Robust Linear Regression by Meng and Mahoney
- OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings by Nelson and Ngyuen

**Tuesday, Nov 8.**Election day, no lecture**Thursday, Nov 10.**Using sketching to speed up iterative solver for regression- Slides
- Chapter 6 in Jelani Nelson's lecture notes (specifically Sec 6.3.3)
- Compact notes on gradient descent for convex functions. Standard references are books by Nesterov and Boyd & Vandenberghe.

**Tuesday, Nov 15.**Graph streaming for connectivity and cut sparsifiers**Tuesday, Nov 15.**Graph sketching for dynamic connectivity, and matchings in streaming setting**Thanksgiving break****Tuesday, Nov 29.**Page Rank and Random Walks/Markov Chains- Handwritten scribbled notes
- Chapter 4 in book of Blum-Hopcroft-Kannan
- Chapter 16 in Kent Quanrud's notes on randomized algorithms
- Link analysis chapter/slides/videos in Mining of Massive Data Sets
- Bob Gallagher's chapter on Finite State Markov Chains from his book. Available at MIT OCW site for his course.
- Wikipedia article on PageRank with several references. Wikipedia article on Perron-Frobenius theorem and a recent proof
- Article by Biswas on various proofs of the fundamental theorem of Markov chains. Markov Chain Tree theorem

**Thursday, Dec 1.**Core sets and Compressed Sensing and wrap up- Slides: core sets and compressed sensing
- Chapter 6 in Jelani Nelson's lecture notes for compressed sensing
- Chapter 5 in Cormode-Yi book
- Survey on core sets by Feldman
- A new coreset framework for clustering
- Dimensionality reduction ala JL for clustering

**Tuesday, Dec 6.**Review for exam.

- Small Summaries for Big Data by Graham Cormode and Ke Yi. Cambridge University Press, Dec 2020.
- Course notes of Amit Chakrabarti focusing mostly on streaming algorithms.
- Lecture notes of Jelani Nelson on sketching algorithms.
- Foundations of Data Science, by Avrim Blum, John Hopcroft and Ravi Kannan. Broad technical introduction covering foundations of several topics. UIUC students should have access to the digital version but there is also a free draft version available here.
- Mathematical Foundations of Data Analysis, Jeff Phillips.
- Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman and Jeff Ullman. Emphasis on the practical aspects
- Sketching as a Tool for Numerical Linear Algebra, a survey by David Woodruff. Copy for personal use from author's website
- Survey on sketching by Graham Cormode
- Survey on Graph Streams, Andrew MacGregor.
- Lecture Notes on Randomized Linear Algebra by Michael Mahoney. See also an older survey by same author.
- A book in preparation on data stream algorithms by Andrew McGregor and Muthu Muthukrishnan
- Surveys on core-sets. Core-sets: An updated survey by Dan Feldman. Coresets and Sketches by Jeff Phillips. Another one by Muneanu and Schwiegelshohn. Original one by Agarwal, Har-Peled and Varadarajan (see also Sariel's book).
- Lecture notes on Massively Parallel Computing model and algorithms by Mohsen Ghaffari
- Sublinear Algorithms is a topic that we will not cover in this course, however it is a related area and there are many resources collected at this website.

- Books on probability and computing: Probability and Computing (Mitzenmacher-Upfal), Randomized Algorithms (Motwani-Raghavan), The Probabilistic Method (Alon-Spencer), Concentration of Measure (Dubhashi-Panconesi)
- A survey on concentration inequalities by Fan Chung and Linyuan Li
- Probabilistic Tools for the Analysis of Randomized Optimization Heuristics by Benjamin Doerr
- Pseudorandomness by Salil Vadhan
- High Speed Hashing for Integers and Strings by Mikkel Thorup

- Algorithms for Big Data: Chandra (UIUC), Fall 2020
- Sketching Algorithms: Jelani Nelson (Berkeley), Fall 2020
- Data Stream Algorithms: Amit Chakrabarti (Dartmouth), Spring 2020
- The Modern Algorithmic Toolbox : Greg Valiant (Stanford), Fall 2022
- Algorithms for Big Data: Jelani Nelson (Harvard), Fall 2015.
**Includes Video Lectures** - Algorithms for Data Science: Andrew MacGregor (UMass Amherst), Fall 2022
- Sublinear Algorithms: Eric Price (UT Austin), Fall 2020
- Algorithms for Big Data: David Woodruff (CMU), Fall 2019
- Algorithmic Techniques for Big Data: Moses Charikar (Stanford), Winter 2020. (content seems inaccessible)
- Algorithms for Massive Data: Alex Andoni (Columbia), Spring 2019.
- Data Stream Algorithms: Andrew MacGregor (UMass Amherst), Spring 2018.
- Models of Computation for Massive Data: Jeff Philips (Utah)