This workshop will cover recent developments in using machine learning to improve the performance
of “classical” algorithms, by adapting their behavior to the properties of the input distribution.
This reduces their running time, space usage or improves their accuracy, while (often) retaining worst case guarantees.

The workshop will cover general approaches to designing such algorithms, as well as specific case studies.
We plan to cover learning-augmented methods for designing data structures,
streaming and sketching algorithms, on-line algorithms, compressive sensing and recovery, error-correcting codes,
scheduling algorithms, and combinatorial optimization.
The attendees span a diverse set of areas, including theoretical computer science, machine learning,
algorithmic game theory, coding theory, databases and systems.

Towards Learned Algorithms, Data Structures, and Systems

All systems and applications are composed from basic data structures and algorithms,
such as index structures, priority queues, and sorting algorithms.
Most of these primitives have been around since the early beginnings of computer science (CS)
and form the basis of every CS intro lecture. Yet, we might soon face an inflection point:
recent results show that machine learning has the potential to alter the way those primitives
or systems at large are implemented in order to provide optimal performance for specific applications.

In this talk, I will highlight several results from my research group on how machine learning is
changing the way we build systems and outline different ways to build learned algorithms and data
structures to achieve “instance-optimality” with a particular focus on data management systems.

10:40 -- 11:20

Mohammad Alizadeh

Learning to Schedule

Efficiently scheduling jobs on distributed compute clusters requires complex algorithms.
Current systems use simple, generalized heuristics and ignore workload characteristics, since
developing and tuning a scheduling policy for each workload is infeasible.
In this talk, I will discuss Decima, a scheduling system for data processing clusters that
learns sophisticated workload-specific scheduling algorithms entirely through experience.
Decima uses reinforcement learning (RL) and neural networks to learn a workload-specific
scheduling algorithm without any human instruction beyond a high-level objective,
such as minimizing average job completion time. I will highlight key aspects of Decima’s
design, including novel RL training techniques for systems like Decima that must operate in
the presence of a stochastic input process.

The best algorithm for a computational problem generally depends on the “relevant inputs”, a concept that
depends on the application domain and often defies formal articulation. While there is a large literature on
empirical approaches to selecting the best algorithm for a given application domain, there has been surprisingly
little theoretical analysis of the problem.

We adapt concepts from statistical and online learning theory to reason about application-specific algorithm selection.
Our models are straightforward to understand, but also expressive enough to capture several existing approaches in the
theoretical computer science and AI communities, ranging from self-improving algorithms to empirical performance models.
We present one framework that models algorithm selection as a statistical learning problem, and our work here shows that
dimension notions from statistical learning theory, historically used to measure the complexity of classes of
binary- and real-valued functions, are relevant in a much broader algorithmic context.
We also study the online version of the algorithm selection problem, and give possibility and impossibility results for
the existence of no-regret learning algorithms.

Optimization problems on graphs occupy a central role in algorithmic design as well as
in machine learning (clustering or is a popular example). The success of machine learning
methods in a variety of domains provides a new impetus to ask whether such algorithms can
be “learnt” directly.

A key challenge is that such learnable algorithms need to generalize not only to
(exponentially many) unseen instances but also to graphs of unseen (and much larger) sizes.
We propose a reinforcement-learning based solution by representing graphs as randomized
nonlinear dynamical systems. We show that such representations are guaranteed to be
information-lossless. We also show how training on hard theoretical instances of small graph
sizes can lead to generalization for much larger sized graphs (100~1000x). We show this on
the optimization problems of minimum vertex cover, maximum cut, maximum independent set and
community detection which vary across the complexity theoretic hardness spectrum.

Inventing Communication Algorithms via Deep Learning

Coding theory is a central discipline underpinning wireline and wireless modems that are
the workhorses of the information age. Progress in coding theory is largely driven by
individual human ingenuity with sporadic breakthroughs over the past century. In this work
we study whether it is possible to automate the discovery of communication algorithms via
deep learning.

We present the first family of codes in the presence of noisy feedback - designed via learning.
While the study of this problem was initiated by Shannon and feedback is known to improve performance,
good codes are unknown and linear codes are known to be highly sub-optimal. Our learnt codes
significantly outperform known codes by several orders of magnitude in reliability. For the channel
without feedback, we show that we can construct near-optimal codes directly via learning. These codes
are more robust and adaptive than the state-of-the-art codes.

2:50 -- 3:10

Break

3:10 -- 3:50

Ali Mousavi

From Supervised to Unsupervised Computational Sensing

Great progress has been made on sensing, perception, and signal processing over the last decades
through the design of algorithms matched to the underlying physics and statistics of the task at hand.
However, a host of difficult problems remain where the physics-based approach comes up short;
for example, unrealistic image models stunt the performance of MRI and other computational imaging systems.
Fortunately, the big data age has enabled the development of new kinds of machine learning algorithms
that augment our understanding of the physics with models learned from large amounts of training data.
In this talk, we will overview integrated physics+data algorithms for solving the kinds of inverse
problems encountered in computational sensing. In particular, we show how data can be used to learn
a more realistic signal model that boosts the performance of an iterative recovery algorithm in a supervised way.
We then show how we can adopt statistical techniques to solve inverse problems in computational sensing
even without having ground truth data and in an unsupervised way.

The goal of compressed sensing is make use of image structure to
estimate an image from a small number of linear measurements. The
structure is typically represented by sparsity in a well-chosen
basis. We show how to achieve guarantees similar to standard
compressed sensing but without employing sparsity at all -- instead,
we suppose that vectors lie near the range of a generative model G:
R^{k} -> R^{n}. Our main theorem here is that, if
G is L-Lipschitz, then
roughly O(k log L) random Gaussian measurements suffice; this is O(k d
log n) for typical d-layer neural networks.

The above result describes how to use a model to recover a signal from
noisy data. But if the data is noisy, how can we learn the generative
model in the first place? The second part of my talk will describe how
to incorporate the measurement process in generative adversarial
network (GAN) training. Even if the noisy data does not uniquely
identify the non-noisy signal, the _distribution_ of noisy data may
still uniquely identify the distribution of non-noisy signals.

This talk is based on joint works with Ashish Bora, Ajil Jalal, and
Alex Dimakis.

Classical streaming algorithms typically do not leverage data properties or patterns
in their input. We propose to augment such algorithms with a learning model that enables
them to exploit data properties without being specific to a particular pattern or property.
We focus on the problem of estimating the frequency of elements in a data stream, a
fundamental problem with applications in network measurements, natural language processing,
and security. We propose a new framework for designing frequency estimation streaming
algorithms that automatically learn to leverage the properties of the input data. We present
a theoretical analysis of the proposed algorithms and prove that, under natural assumptions,
they have lower space complexity than prior algorithms. We also evaluate our algorithms on two
real-world datasets and demonstrate empirically their performance gains.

This talk is based on joint works with Anders Aamand, Chen-Yu Hsu, Piotr Indyk and
Dina Katabi.

An online learning approach to data-driven algorithm selection

Data driven algorithm design for combinatorial problems is an important aspect of modern data science.
Rather than using off the shelf algorithms that only have worst case performance guarantees, practitioners
typically optimize over large families of parametrized algorithms and tune the parameters of these algorithms
based on past problem instances in order to perform well on future instances from that domain.

In this talk, I will discuss general techniques for providing formal guarantees for data driven algorithm
selection via online learning. We provide upper and lower bounds on regret for algorithm selection from infinite
parametrized families of algorithms in online settings, where problems arrive sequentially and we must choose
parameters online. The major technical challenge is that for many combinatorial problems
(including subset selection problems, partitioning problems, and clustering), small differences between two
algorithms can lead to a cascade of changes in their behavior and dramatically change their performance.
This leads to very interesting challenges that require new techniques that help significantly push the
boundaries of learning theory as well.

Online algorithms with ML advice: caching & beyond

Classic algorithm design focuses on optimizing worst case complexity, resulting in algorithms
that perform well for any possible input. In many applications, however, the typical input is
not just far from worst case, but has some predictable characteristics. We first focus on the
caching problem and give results that formally incorporate machine learned predictions into
algorithm design. We then talk about the modeling choices and challenges faced when extending
the work to weighted caching.

11:20 -- 11:40

Break

11:40 -- 12:20

Manish Purohit

Ski-Rental with Predictions: Three Models

I'll give an overview of recent work in augmenting traditional online algorithms with
additional information such as machine-learnt predictions. This talk demonstrates the
different design choices available to the algorithm designer with respect to the type and
structure of the predictions used. Naturally, the flavor of theoretical results obtained
depends greatly on the prediction models. Using the well-studied Ski-Rental problem as a
running example, I'll cover three distinct prediction models and design online algorithms
in these regimes with very different competitive ratios.

Learning Augmented Algorithms: The Case of Multiple Experts

Recent research has focused on incorporating the advice of ML models in improving the performance of online algorithms, without compromising
the intrinsic robustness offered by traditional competitive analysis. This talk will focus on the case of multiple experts in this setting. In particular, we show
how the advice provided by multiple experts can be incorporated into an online algorithm for the classical ski rental problem. The goal is to match the
performance of the best expert without knowing her identity, a task similar in flavor to the traditional experts learning. In contrast to online learning, however,
this is not an iterative process, and the algorithm does not have the scope of learning the identity of the best expert over time. We do not make any assumptions
on the quality of the advice provided by the experts, and obtain tight performance bounds in this context.

This is based on joint work with Sreenivas Gollapudi that appeared at ICML 2019.

The use of machine learning and predictive models have produced a revolution in science
and engineering. Online optimization problems are a natural source of uncertainty that
predictions can be used to manage and improve performance. This paper studies how predictive
techniques can be used to break through worst case barriers in online scheduling.

The make-span minimization problem on unrelated machines and its special case, restricted
assignment, are classic problems in online scheduling theory. Worst case analysis of these
problems yields Ω(log m) lower bounds on the competitive ratio in the online
setting. In this paper we construct non-trivial predictions for these problems and design
algorithms that utilize these predictions to compute solutions online. Our predictions are
compact in size, having dimension linear in the number of machines. The performance guarantees
of our algorithms depend on the accuracy of the predictions, and moderately accurate
predictions allow our techniques to beat the worst case lower bounds. More precisely, the
predictions can be used to construct O(log η)-competitive fractional assignments,
where η is the error of the predictions.
We then give an online algorithm that is O((loglog(m))^{3})-competitive to
round these fractional assignments, with a nearly matching Ω(loglog m) lower bound
for online rounding algorithms.

We consider the semi-online model that generalizes the classical
online computational model. The semi-online model postulates that the
unknown future has a predictable part and an adversarial part; these
parts can be arbitrarily interleaved. An algorithm in this model
operates as in the standard online model, i.e., makes an irrevocable
decision at each step. We study bipartite matching in the semi-online
model. Our main contributions are competitive algorithms for this
problem and a near-matching hardness bound. The competitive ratio of
the algorithms nicely interpolates between the truly offline setting
(i.e., no adversarial part) and the truly online setting (i.e., no
predictable part).

Joint work with Ravi Kumar, Manish Purohit, Aaron Schild, Erik Vee.

In the secretary problem, a set of items are revealed to us in random order, and we want to maximize
the probability of picking the best among them. In the (stochastic) multi-armed bandit problem,
we perform pulls from a set of arms, each with a fixed but unknown payoff probability, and want to
minimize the regret. Both these problems are well-studied in the sequential decision-making
literature.

We consider the semi-adversarial setting where an adversary is allowed to introduce some
limited amount of corruptions. We show that classical algorithms fail badly in the presence of
corruptions, and then we give algorithms that are more robust to corruptions.

This is based on joint works with Domagoj Bradac, Sahil Singla, and Goran Zuzic, and with
Tomer Koren and Kunal Talwar.

10:40 -- 11:20

Yaron Singer

Algorithms in the Era of Machine Learning: An Inconvenient Truth

The traditional approach in algorithm design assumes that there is an underlying objective
that is known to the algorithm designer and the focus is on efficiently optimizing that
objective. In many cases however, the objectives we aim to optimize are not known but rather
learned from data. So what are the guarantees of the algorithms we develop and teach when
the input is learned from data? In this talk we will address this question and discuss
challenges at the intersection of machine learning and algorithms. We will present some
stark impossibility results and argue for new algorithmic paradigms.

11:20 -- 11:40

Break

11:40 -- 12:20

Le Song

Learning to Plan via Neural Exploration-Exploitation Trees

Sampling-based planning algorithms such as RRT and its variants are powerful tools for path planning problems in
high-dimensional continuous state and action spaces. While these algorithms perform systematic exploration of the
state space, they do not fully exploit past planning experiences from similar environments.
In this paper, we design a meta path planning algorithm, called Neural Exploration-Exploitation Trees (NEXT),
which can utilize prior experience to drastically reduce the sample requirement for solving new path planning problems.
More specifically, NEXT contains a novel neural architecture which can learn from experiences the dependency between
task structures and promising path search directions. Then this learned prior is integrated with a UCB-type algorithm
to achieve an online balance between exploration and exploitation when solving a new problem. Empirically, we show
that NEXT can complete the planning tasks with very small search trees and significantly outperforms previous
state-of-the-arts on several benchmark problems.

Deep Learning has been applied successfully to many basic human tasks such as object
recognition and speech recognition, and increasingly to the more complex task of language
understanding. In addition, deep learning has been extremely successful in the context
of planning tasks in constrained environments (e.g., game playing).

We ask: Can deep learning be applied to design algorithms?

We formulate this question precisely, and show how to apply deep reinforcement learning
to design online algorithms for a number of optimization problems such as the AdWords problem
(choosing which advertisers to include in a keyword auction), the online knapsack problem,
and optimal stopping problems. Our results indicate that the models have learned behaviours
that are consistent with the traditional optimal algorithms for these problems.

Joint work with William Kong, Chris Liaw, and Aranyak Mehta; appeared in
ICLR 2019.

1:00 -- 2:30

Lunch (provided on site)

2:30 -- 3:10

Ilya Razenshteyn

Learning Space Partitions for Nearest Neighbor Search

I will talk about a new algorithm for finding high-quality space partitions useful for nearest
neighbor data structures. It is based on balanced graph partitioning followed by supervised learning.
When instantiated with KaHIP (Sanders, Schulz SEA 2013) and neural networks, respectively, the new
algorithm consistently outperforms quantization-based and tree-based partitioning methods.

Joint work with Yihe Dong (Microsoft Research), Piotr Indyk (MIT) and Tal Wagner (MIT).

3:10 -- 3:50

Piotr Indyk

Learning-Based Low-Rank Approximations

We introduce a “learning-based” algorithm for the low-rank matrix decomposition
problem: given an n × d matrix A, and a parameter k, compute a rank-k matrix A' that
minimizes the approximation loss |A-A'|_{F}. The algorithm uses a training set of input matrices
in order to optimize its performance. Specifically, some of the most efficient approximate algorithms
for computing low-rank approximations proceed by computing a projection SA, where S is a sparse
random m × n “sketching matrix”, and then performing the singular value
decomposition of SA. We show how to replace the random matrix S with a “learned”
matrix of the same sparsity to reduce the error.

Our experiments show that, for multiple types of data sets, a learned sketch matrix can substantially
reduce the approximation loss compared to a random matrix S, sometimes by one order of magnitude.
We also study mixed matrices where only some of the rows are trained and the remaining ones are random,
and show that matrices still offer improved performance while retaining worst-case guarantees.