Algorithms for Massive Data (COMS E6998-9)

Class Description

Modern Data presents both a big promise but also a big challenge --- how are we to extract that promise? The classic algorithms for processing data are often insufficient to deal with the datasets of modern sizes. For example, a quadratic-time algorithm means that 10x increase in data size requires a 100x increase in resources!

This class will focus on algorithmic techniques to tackle massive datasets efficiently. We will see computational models for how to think of massive data processing, and explore core algorithmic ideas, such as how to summarize complex data via sampling and small sketches techniques.


Mathematical maturity is a must: the class is based on theoretical ideas and is proof-heavy. You are expected to be able to read and write formal mathematical proofs. Some familiarity with algorithms and randomness will be assumed as well. COMS 4231 (Analysis of Algorithms) or equivalent is useful, but not required if you have solid math background.

Undergradute students and students from other departments are welcome.

(Tentative) List of Topics

You can also check out a previous version of the class from Fall'15 (although some topics may change).