CS 598 TMC, Fall 2023
Advanced Data Structures
Instructor
Timothy Chan
(tmc "at" illinois.edu, office hours: Thu 3:30-4:30, Siebel 3230)
TA: Yuancheng Yu (yyu51 "at" illinois.edu, office hr: Wed 6:00pm-7:00pm
on zoom)
Meeting Time
Wed & Fri 11:00am-12:15pm, Loomis Lab 136
Course Overview
This is a CS theory/algorithms course, covering selected topics in data structures, which go beyond what are typically taught in 2nd and 3rd-year undergraduate classes. Potential topics include: balanced search trees, priority queues (e.g., Fibonacci heaps), amortized analysis, the union-find problem, hashing, geometric data structures (e.g., range searching), approximate nearest neighbor search (e.g., locality-sensitive hashing), bit-packing techniques (e.g., fusion trees and succinct data structures), persistent data structures, dynamic graph algorithms (e.g., dynamic connectivity and shortest paths), distance oracles, strings and text indexing (e.g., suffix trees), I/O-efficient data structures, and (conditional) lower bounds.
Prerequisites: Solid background in algorithm
design and analysis, at the level of CS 374. (CS 473 is not required, though some previous exposure to randomized algorithms might be helpful.) Undergraduate students who did well in 374 or 473 are welcomed, and may request for an UG petition.
Course Work
- 4 homeworks (problem sets), worth 45%
- HW1 (due Sep 22 Fri 10am) (9/17: minor typo in Q4 fixed; 9/20: typo in Q3, see footnote)
- HW2 (due Oct 13 Fri 10am) (10/2: add some footnotes)
- HW3 (due Nov 3 Fri 10am)
- HW4 (due Dec 4 Mon 10am)
- submit homework on Gradescope
(entry code PWXV3D)
- presentation, worth 15%
- project (reading some papers and writing a report, or implementing and experimentally
evaluating data structures from papers, or doing original theoretical research), worth 40%
(Homeworks, presentations, and projects may be done in groups of at most 3.)
See guidelines for presentation and project.
Tentative Outline
- Basics: RMQ, balanced search trees, heaps with decrease-key, union-find, ...
- Integers: hashing, vEB trees, fusion trees, ...
- Geometry: orthogonal range search, point location, approximate nearest neighbors/LSH, ...
- Graphs: dynamic connectivity, dynamic reachability, approximate distance oracles, ...
- Strings: suffix trees/arrays, ...
- Other models: succinct data structures, external memory, streaming/sketching, ...
Lectures
(There is no textbook; I'll provide links to relevant references below, and also scribbles from class.)
- Aug 23: First example: Range min queries (RMQ).
- Aug 25: RMQ cont'd (and LCA).
[Ref: Bender and Farach-Colton (2005),
or Fischer and Heun (2006)]
- Aug 30: Dynamic predecessor search and balanced search trees (2-3 trees etc.).
[Ref: wikipedia on 2-3 trees,
red-black trees,
AVL trees,
treaps,
splay trees,
etc.;
Andersson's AA trees (1993)]
- Sep 1: More balanced search trees ((a,b) trees, weight-balanced trees) and amortized analysis.
[Ref: Rolf Fagerberg's notes on
amortized analysis of (a,b) trees; Michiel Smid's notes on
amortized analysis of weight-balanced trees]
- Sep 6: Priority queues and heaps.
- Sep 8: Quake heaps (alternative to Fibonacci heaps).
[Ref: my paper (2013)]
- Sep 13: Union-find.
- Sep 15: Union-find cont'd (with inverse Ackermann).
[Ref: Ch2 of Tarjan's book,
Jeff's notes,
papers by la Poutre (1990) and Seidel and Sharir (2005)]
- Sep 20: Hashing.
- Sep 22: Hashing cont'd (universal, perfect, Fredman-Komlos-Szemeredi).
[Ref: Sec 8.4-8.5 of Motwani and Raghavan's book, or
Jeff's notes]
- Sep 27: More hashing (cuckoo). Integer predecessor search...
[Ref: Pagh and Rodler's paper (2004) (or the
"undergrad version")]
- Sep 29: Van Emde Boas trees.
[Ref: wikipedia]
- Oct 4: Fusion trees.
[Ref: Fredman and Willard's paper (1990) (or
see Lemma 3.3 in Grossi et al.'s paper for the simpler
solution with nonstandard word operations)]
- Oct 6: Orthogonal range searching (k-d trees, range trees).
[Ref: Ch 5 of de Berg et al.'s book]
- Oct 11: More on range trees and improvements.
[Ref: Sec 3.1 of Chazelle'88;
and for those curious, my paper with Larsen and Patrascu'10]
- Oct 13: Point location (slab method, segment tree).
[Ref: Sec 10.3 of de Berg et al.'s book,
Ch 2 of Preparata and Shamos's book]
- Oct 18: Point location (fractional cascading, persistent search tree).
[Ref: Chazelle and Guibas's two
papers; Sarnak and Tarjan's paper (1986)]
- Oct 20: Point location (Kirkpatrick's hierarchy). Approximate nearest neighbor search...
[Ref: Kirkpatrick's paper (1981) or
Preparata and Shamos's book or
Iacono's video]
- Oct 25: Approximate nearest neighbor search (grids, dimension reduction, LSH...)
[Ref: Indyk and Motwani (1998), Sec 2.1 of
Matousek's notes on dimension reduction]
- Oct 27: LSH (cont'd).
[Ref: Indyk and Motwani (1998), Sec 2.9 of
Matousek's notes]
- Nov 1: Dynamic graph connectivity.
[Ref: Holm, de Lichtenberg, and Thorup (2001)]
- Nov 3: Dynamic subgraph connectivity.
[Ref: C., Patrascu, and Roditty (2008)]
- Nov 8: Conditional lower bounds. Approximate distance oracles...
[Ref: Abboud and Vassilevska (2014);
Thorup and Zwick (2001)]
- Nov 10: Approximate distance oracles (Thorup-Zwick). Strings...
- Nov 15: Suffix trees and suffix arrays.
[Ref: Karkkainen and Sanders (2003);
and for applications with range searching, see also Lewenstein (2013)]
- Nov 17: External-memory and cache-oblivious data structures.
[Ref: Arge (2003) on buffer trees, Vitter's survey on external memory and Demaine's survey
on cache-oblivious algorithms]
- Nov 29, Dec 1, Dec 6: student presentations
Other Resources