Info Course Plan Resources Logistics
Algorithms for Massive Data (TTIC 41000 - Special Topics)

Recent availability of large data sets has had a significant impact on the design of algorithms. While working with big data, classical algorithms are often too inefficient, e.g., they are too slow, or require too much space. This course focuses on algorithms that are specifically designed for large datasets and will cover the following topics.

  • Some of the new computational models that capture various aspects of massive data computation such as streaming algorithms, and sub-linear time algorithms.
  • Some of the algorithmic techniques and tools for solving problems over massive data, such as sampling, sketching, dimensionality reduction, and computing efficient summaries of the data (e.g., core-sets).
This is a theoretical course and targets both graduate students and advanced undergraduate students with a strong background in algorithms and discrete mathematics.

Lecture Time: Monday and Wednesday - 4:10-5:30pm
Office Hours: Monday 6-7pm
Location: Remote Course

Instructor: Sepideh Mahabadi (TTIC)
Instructor's email: mahabadi@ttic.edu
Course Plan
Monday, March 29th Course Logistics, Introduction to the course, Distinct Elements, Morris Counter
(Slides, Notes on Morris Counter from Jelani's Lecture)
Wednesday, March 31st Norm and Frequency Estimation Streaming Algorithms (AMS, CountMin, CountSketch)
(Slides, See also Piotr's slides: Lec3, Lec5, Lec6)
Monday, April 5th Streaming Graph Algorithms (Connectivity using L_0 samplers)
(Slides, See Andrew's course for more streaming graph algorithms.)
Wednesday, April 7th Streaming Algorithms for Coverage Problems
(Slides)
Monday, April 12th Streaming Geometric Algorithms
(Slides, See Piotr's lecture)
Wednesday, April 14th Streaming Lower Bounds
(Slides)
Monday, April 19th Core-sets (definition, and core-set for diversity maximization)
(Slides)
Wednesday, April 21st Core-sets (for k-median)
(Slides, See Dan Feldman's Videos on Core-sets)
Monday, April 26th Dimension Reduction
(Slides, Lecture Notes from Piotr and Jelani's course: Lec3, Lec5, Lec9)
Wednesday, April 28st Nearest Neighbor Search
(Slides, See the ANN paper by Har-Peled, Indyk, and Motwani)
Monday, May 3rd Sub-linear Time Algorithms
(Slides, See also Ronitt's slides: Lec1, Lec12, Lec13)
Wednesday, May 5th Sub-linear Time Algorithms
(Slides, See Ronitt's slides: Lec13 )
Monday, May 10th Property Testing (Testing on Distributions)
(Slides, See Ronitt's slides: Lec2, Lec2 Notes, Lec4 )
Wednesday, May 12th Randomized Linear Algebra (Matrix Product Apprxoimation)
(Slides, See Jelani's Lecture)
Monday, May 17th Randomized Linear Algebra
Wednesday, May 19th Randomized Linear Algebra (Applications)
Monday, May 24th Project Presentation
Wednesday, May 26th Project Presentation
Assignments
Resources

The class uses materials from the following courses Other resources
Logistics

Grading:
  • Two PSets (Each 25%)
  • Final Project (40%)
  • Class Participation (10%)
Homeworks:
  • Students are free to discuss the problems among themselves but each person should completely understand and write their own solutions, and further write the name of the collaborators.
Project:
  • Can be done individually, or in groups of size two.
  • Each project includes a 10-20 minutes presentation in class, and a 5-10 pages report.
  • One can choose between two types of projects
    • Summary Project: Reading one or several papers on a related topic and writing a summary report for the considered papers. There will be a list of suggested papers, but you are also encouraged to look up for other papers.
    • Research Project: This consists of working on a research project related to the class material.
  • Important Dates
    • Proposal Deadline: April 30th
    • First Draft Deadline: May 24th
    • Final Draft Deadline: June 4th