## Algorithms for Massive Data (COMS E6998-9)

- Instructor: Alexandr Andoni
- Time: Tue, Thu 2:40-3:55pm
- Room: TBD
- Office hours: TBD.

#### Class Description

Modern Data presents both a big promise but also a big challenge ---
how are we to extract that promise? The classic algorithms for
processing data are often insufficient to deal with the datasets of
modern sizes. For example, a quadratic-time algorithm means that 10x
increase in data size requires a 100x increase in resources!

This class will focus on algorithmic techniques to
tackle massive datasets efficiently. We will see computational models
for how to think of massive data processing, and explore core
algorithmic ideas, such as how to
summarize complex data via sampling and small sketches techniques.

#### Prerequisites

Mathematical maturity is a must: the class is based on
theoretical ideas and is proof-heavy. You are expected to be
able to read and write formal mathematical proofs. Some familiarity with algorithms and randomness will be assumed as
well. COMS 4231 (Analysis of Algorithms) or equivalent is
useful, but not required if you have solid math
background.

Undergradute students and students from other departments are welcome.

#### (Tentative) List of Topics

- sketching: algorithms for representing objects approximately with
a small "sketch", and how to use such sketches to speed-up algorithms
such as linear regression or approximate SVD (numerical linear
algebra). We will see the concept of dimension reduction, among
others.
- streaming: algorithms that see data as a single stream and use space
smaller than the data itself to compute its properties. Some
particular problems include distinct elements, approximate counting,
heavy hitters, frequency moments.
- sampling (property testing): how to learn something about an object
(e.g., distribution) just by sampling it in a few locations.
- parallel algorithms: algorithms in modern parallel models (such as
MapReduce) where the computation is accomplished by many computational
units that each have limited space.

You can also check out a

previous version of the class from Fall'15
(although some topics may change).