## Algorithms for Massive Data (COMS E6998-9)

• Instructor: Alexandr Andoni
• Time: Tue, Thu 2:40-3:55pm
• Room: TBD
• Office hours: TBD.

#### Class Description

Modern Data presents both a big promise but also a big challenge --- how are we to extract that promise? The classic algorithms for processing data are often insufficient to deal with the datasets of modern sizes. For example, a quadratic-time algorithm means that 10x increase in data size requires a 100x increase in resources!

This class will focus on algorithmic techniques to tackle massive datasets efficiently. We will see computational models for how to think of massive data processing, and explore core algorithmic ideas, such as how to summarize complex data via sampling and small sketches techniques.

#### Prerequisites

Mathematical maturity is a must: the class is based on theoretical ideas and is proof-heavy. You are expected to be able to read and write formal mathematical proofs. Some familiarity with algorithms and randomness will be assumed as well. COMS 4231 (Analysis of Algorithms) or equivalent is useful, but not required if you have solid math background.

Undergradute students and students from other departments are welcome.

#### (Tentative) List of Topics

• sketching: algorithms for representing objects approximately with a small "sketch", and how to use such sketches to speed-up algorithms such as linear regression or approximate SVD (numerical linear algebra). We will see the concept of dimension reduction, among others.
• streaming: algorithms that see data as a single stream and use space smaller than the data itself to compute its properties. Some particular problems include distinct elements, approximate counting, heavy hitters, frequency moments.
• sampling (property testing): how to learn something about an object (e.g., distribution) just by sampling it in a few locations.
• parallel algorithms: algorithms in modern parallel models (such as MapReduce) where the computation is accomplished by many computational units that each have limited space.
You can also check out a previous version of the class from Fall'15 (although some topics may change).