Info Invited Speakers Schedule Resources
STOC'20 Workshop on Algorithms with Predictions

This workshop aims to cover recent developments in the emerging area of “learning-based” algorithms (aka data driven algorithms, algorithms with predictions, learning augmented algorithms). These methods incorporate machine learning “oracles” to adapt their behavior to the properties of the input distribution and consequently improve their performance, such as runtime, space or quality of the solution. The field has blossomed with applications to classical streaming algorithms, online scheduling, clustering, and many others. All of these methods guarantee improved performance when the predictions are good, and maintain nearly identical worst-case guarantees when they are not.

The workshop will cover recent advances of this topic in different domains including learning theory, online algorithms, streaming algorithms and data structures.

When: June 26, 2020 (1pm -- 4pm (EST))
Where (to watch): The recording is available here
Organizers: Piotr Indyk (MIT), Yaron Singer (Harvard), Ali Vakilian (UW-Madison) and Sergei Vassilvitskii (Google NYC)
Invited Speakers:
Edith Cohen (Google Mountain View and Tel Aviv University)
Ravi Kumar (Google Mountain View)
Michael Mitzenmacher (Harvard University)
Tim Roughgarden (Columbia University)
Schedule
Friday, June 26th (in EST)
01:00 -- 01:10 Sergei Vassilvitskii Opening Remarks
01:10 -- 01:50 Tim Roughgarden Data-Driven Algorithm Design

The best algorithm for a computational problem generally depends on the “relevant inputs”, a concept that depends on the application domain and often defies formal articulation. While there is a large literature on empirical approaches to selecting the best algorithm for a given application domain, there has been surprisingly little theoretical analysis of the problem.

We adapt concepts from statistical and online learning theory to reason about application-specific algorithm selection. Our models are straightforward to understand, but also expressive enough to capture several existing approaches in the theoretical computer science and AI communities, ranging from self-improving algorithms to empirical performance models. We present one framework that models algorithm selection as a statistical learning problem, and our work here shows that dimension notions from statistical learning theory, historically used to measure the complexity of classes of binary- and real-valued functions, are relevant in a much broader algorithmic context. We also study the online version of the algorithm selection problem, and give possibility and impossibility results for the existence of no-regret learning algorithms.

Joint work with Rishi Gupta.

01:50 -- 02:30 Edith Cohen Sketching Functions of Frequencies: Beyond the Worst Case

Recently there has been increased interest in using machine learning techniques to improve classical algorithms. We study when it is possible to construct compact, composable sketches for weighted sampling and statistics estimation according to functions of data frequencies. Such structures are now central components of large-scale data analytics and machine learning pipelines. However, many common functions, such as thresholds and p-th frequency moments with p>2, are known to require polynomial size sketches in the worst case. We explore performance beyond the worst case under two different types of assumptions. The first is having access to noisy advice on item frequencies. This continues the line of work of Hsu et al (ICLR 2019), who assume predictions are provided by a machine learning model. The second is providing guaranteed performance on a restricted class of input frequency distributions that are better aligned with what is observed in practice. This extends the work on heavy hitters under Zipfian distributions in a seminal paper of Charikar et al (ESA 2002). Surprisingly, we show analytically and empirically that “in practice” small polylogarithmic-size sketches provide accuracy for “hard” functions.

Joint work with Ofir Geri (Stanford) and Rasmus Pagh (IT University of Copenhagen, BARC, and Google Research)

02:30 -- 02:40 Coffee Break
02:40 -- 03:20 Ravi Kumar Learning-Augmented Online Learning

We consider the problem of online linear optimization with hints and show near-optimal regret bounds in terms of the “goodness” of the hints.

Joint work with Aditya Bhaskara, Ashok Cutkosky, Manish Purohit.

03:20 -- 04:00 Michael Mitzenmacher Works in Progress: Queues with Predictions and the Partitioned Learned Bloom Filter

We review some past work on Algorithms with Predictions and provide some updates on “work in progress”. We first review recent results on queues that use predictions on job sizes, and discuss in progress work on 1) queues with predictions in the setting of the power of two choices 2) queues with very small prediction-based advice. We also review the learned Bloom filter and explain how to make better use of the learned models used by using multiple Bloom filters for different score ranges, and show this partitioned learned Bloom filter can yield significantly better results.

The talk includes joint work with Matteo Dell'Amico (predictions with multiple queues), and with Kapil Vaidya, Tim Kraska, and Eric Knorr (partitioned learned Bloom filters).

Resources:
  • Machine Learning for Algorithm Design by Eric Balkanski, Fall 2020 (Columbia University). [course webpage]
  • Learning-Augmented Algorithms (6.890) by Piotr Indyk and Costis Daskalakis, Spring 2019 (MIT). [course webpage]
  • Algorithms with Predictions by Michael Mitzenmacher and Sergei Vassilvitskii, survey paper (a chapter in Beyond the Worst-Case Analysis of Algorithms book, a collection edited by Tim Roughgarden), 2020. [survey]
  • Technical perspective: Algorithms selection as a learning problem by Avrim Blum, research highlights (CACM), May 2020. [short paper]
  • Data-driven algorithm design by Tim Roughgarden and Rishi Gupta, research highlights (CACM), May 2020. [short paper]
  • Application-Specific Algorithm Selection by Tim Roughgarden, Simons Institute Open Lecture, 2016. [video] [slides]
  • Data Driven Algorithm Design by Nina Balcan, Plenary talk at 24th Annual LIDS Student Conference, MIT 2019. [video] [slides]
  • Learning in Algorithms by Sergei Vassilvitskii, Survey talk at 4th Highlights of Algorithms (HALG 2019). [slides]
  • Learning-Augmented Algorithms: How ML can Lead to Provably Better Algorithms by Michael Mitzenmacher, Keynote talk at (ALGO 2019). [slides]
  • Workshop on Data-driven Algorithmics by Andreas Krause, Pavlos Protopapas and Yaron Singer, Harvard, 2015. [workshop webpage]
  • Workshop on Data-driven Algorithmics by Andreas Krause and Yaron Singer, Bertinoro, 2017. [workshop webpage]
  • Workshop on Automated Algorithm Design by Nina Balcan, Bistra Dilkina, Carl Kingsford and Paul Medvedev, TTIC, 2019. [workshop webpage]
  • Workshop on Learning-Based Algorithms by Piotr Indyk, Yaron Singer, Ali Vakilian and Sergei Vassilvitskii, TTIC, 2019. [workshop webpage]