CS 6104/5914: Algorithms for Big Data (Fall 2025)

Lectures: Tue/Thu 12:30pm–1:45pm in MCB 240
Instructor: Ali Vakilian (Office Hours: Wed 11am–12pm in TORG 3120)

Description: Modern research in machine learning, operations research and large-scale scientific computing often deals with data too large to store or process directly. This course introduces rigorous, space-efficient algorithms for massive data sets, including streaming and sketching, dimensionality reduction, scalable numerical linear algebra and graph algorithms. Students will explore recent advances and develop the background needed for research in big-data algorithms. Check the syllabus for more details.

big_data_logo
Prerequisites: CS 4100 or equivalent, or consent of the instructor. Familiarity with basic probability and linear algebra is expected.
Credits: 3

Schedule

Lecture 1 (8/26) slides lecture notes
  • Class logistics
  • Intro to the course
  • Background topics: probability and linear algebra
  • Notes on basic probability by Chandra Chekuri
  • Notes on basic probability by Krzysztof Onak
Lecture 2 (8/28) slides lecture notes
  • Intro to streaming model and sampling
  • Lecture notes by Chandra Chekuri (Alg. for Big Data, Fall’22, UIUC)
Lecture 3 (9/2) slides lecture notes
  • Median estimation
  • Probabilistic counting
  • Median trick
  • Lecture notes by Chandra Chekuri (Alg. for Big Data, Fall’22, UIUC)
Lecture 4 (9/4) slides lecture notes
  • Intro to streaming via frequency estimation
  • Distinct element problem
  • Flajolet-Martin algorithm
  • Lecture notes by Chandra Chekuri (Alg. for Big Data, Fall’22, UIUC)
  • Lecture notes by Deeparnab Chakrabarty (Randomized Alg., Spring’24, Dartmouth College)
  • Lecture notes by Cameron Musco (Alg. for Data Science, Spring’20, UMass Amherst)
Lecture 5 (9/9) slides lecture notes
  • Frequenct Moments
  • AMS sampler for frequency moments estimation
Lecture 6 (9/11) slides lecture notes
  • AMS sketch for (F2)-estimation
  • Linear sketches
  • Intor to heavy hitters (Boyer-Moore algorithm)
Lecture 7 (9/16) slides lecture notes
  • Heavy hitters (Misra-Gries algorithms)
  • CountMin
Lecture 8 (9/18) slides lecture notes
  • CountSketch
  • Sketching Applications
Lecture 9 (9/23) slides lecture notes
  • Sparse Recovery
  • Johnson-Lindenstrauss Lemma
Lecture 10 (9/25) slides lecture notes
  • Subspace Embedding
Lecture 11 (9/30) slides lecture notes
  • Final notes on JL and Subspace Embedding
Lecture 12 (10/2) slides lecture notes
  • Applications of JL and Subspace Embedding
  • Regression
  • Approximate Matrix Multiplication
Lecture 13 (10/7) slides
  • Nearest Neighbor Search
  • Locality Sensetive Hashing
Lecture 14 (10/9) slides
  • Nearest Neighbor Search (contd.)
  • Graph-Based NNS

Assignments

Suggested Final Project Topics

Reading Materials