This workshop aims to cover recent developments in the emerging area of “learning-based”
algorithms (aka data driven algorithms, algorithms with predictions, learning augmented algorithms).
These methods incorporate machine learning “oracles” to adapt their behavior to the properties
of the input distribution and consequently improve their performance, such as runtime, space or quality of the solution.
The field has blossomed with applications to classical streaming algorithms, online scheduling, clustering, and many
others. All of these methods guarantee improved performance when the predictions are good, and maintain nearly identical
worst-case guarantees when they are not.
The workshop will cover recent advances of this topic in different domains including learning theory, online algorithms,
streaming algorithms and data structures.
Edith Cohen (Google Mountain View and Tel Aviv University)
Ravi Kumar (Google Mountain View)
Michael Mitzenmacher (Harvard University)
Tim Roughgarden (Columbia University)
Schedule
Friday, June 26th (in EST)
01:00 -- 01:10
Sergei Vassilvitskii
Opening Remarks
01:10 -- 01:50
Tim Roughgarden
Data-Driven Algorithm Design
The best algorithm for a computational problem generally depends on the “relevant
inputs”, a concept that depends on the application domain and often defies formal
articulation. While there is a large literature on empirical approaches to selecting the
best algorithm for a given application domain, there has been surprisingly little theoretical
analysis of the problem.
We adapt concepts from statistical and online learning theory to reason about
application-specific algorithm selection. Our models are straightforward to understand,
but also expressive enough to capture several existing approaches in the theoretical computer
science and AI communities, ranging from self-improving algorithms to empirical performance
models. We present one framework that models algorithm selection as a statistical learning
problem, and our work here shows that dimension notions from statistical learning theory,
historically used to measure the complexity of classes of binary- and real-valued functions,
are relevant in a much broader algorithmic context. We also study the online version of the
algorithm selection problem, and give possibility and impossibility results for the existence
of no-regret learning algorithms.
Joint work with Rishi Gupta.
01:50 -- 02:30
Edith Cohen
Sketching Functions of Frequencies: Beyond the Worst Case
Recently there has been increased interest in using machine learning techniques to improve
classical algorithms. We study when it is possible to construct compact, composable sketches
for weighted sampling and statistics estimation according to functions of data frequencies.
Such structures are now central components of large-scale data analytics and machine learning
pipelines. However, many common functions, such as thresholds and p-th frequency moments with
p>2, are known to require polynomial size sketches in the worst case. We explore performance
beyond the worst case under two different types of assumptions. The first is having access to
noisy advice on item frequencies. This continues the line of work of Hsu et al (ICLR 2019),
who assume predictions are provided by a machine learning model. The second is providing
guaranteed performance on a restricted class of input frequency distributions that are better
aligned with what is observed in practice. This extends the work on heavy hitters under
Zipfian distributions in a seminal paper of Charikar et al (ESA 2002). Surprisingly, we show
analytically and empirically that “in practice” small polylogarithmic-size
sketches provide accuracy for “hard” functions.
Joint work with Ofir Geri (Stanford) and Rasmus Pagh
(IT University of Copenhagen, BARC, and Google Research)
02:30 -- 02:40
Coffee Break
02:40 -- 03:20
Ravi Kumar
Learning-Augmented Online Learning
We consider the problem of online linear optimization with hints and show near-optimal regret
bounds in terms of the “goodness” of the hints.
Joint work with Aditya Bhaskara, Ashok Cutkosky, Manish Purohit.
03:20 -- 04:00
Michael Mitzenmacher
Works in Progress: Queues with Predictions and the Partitioned Learned Bloom Filter
We review some past work on Algorithms with Predictions and provide some updates on “work in progress”.
We first review recent results on queues that use predictions on job sizes, and discuss in progress work on 1) queues
with predictions in the setting of the power of two choices 2) queues with very small prediction-based advice.
We also review the learned Bloom filter and explain how to make better use of the learned models used by using multiple
Bloom filters for different score ranges, and show this partitioned learned Bloom filter can yield significantly better
results.
The talk includes joint work with Matteo Dell'Amico (predictions with multiple queues), and with Kapil Vaidya, Tim Kraska, and Eric Knorr (partitioned learned Bloom filters).
Resources:
Machine Learning for Algorithm Design
by Eric Balkanski, Fall 2020 (Columbia University).
[course webpage]
Learning-Augmented Algorithms (6.890)
by Piotr Indyk and Costis Daskalakis, Spring 2019 (MIT).
[course webpage]