Info Register Spotlight Speakers Short Talks Attending Schedule
Workshop on Local Algorithms (WOLA) 2020

Announcement Regarding COVID-19: Due to the current situation with COVID-19, we have decided to hold a virtual and shorter version of WOLA this year. WOLA 2020 will run for two days between 8am - 12pm PT to maximize the attendance. This year, we will only include spotlight talks and short (postdoc/student) talks. Hopefully next year we can go back to a more inclusive format.

Local algorithms --- that is, algorithms that compute and make decisions on parts of the output considering only a portion of the input --- have been studied in a number of areas in theoretical computer science and mathematics. Some of the related areas include sublinear-time algorithms, distributed algorithms, streaming algorithms, (massively) parallel algorithms, inference in large networks, and graphical models. These communities have similar goals but a variety of approaches, techniques, and methods. This workshop is aimed at fostering dialogue and cross-pollination of ideas between the various communities. The workshop will feature a small number of longer spotlight talks that, in part, survey approaches by various communities, as well as short, focused talks on recent, exciting results.

The first WOLA was held in the Fall of 2016 at MSR New England, the second WOLA was held in the summer of 2018 at MIT, and last year the third WOLA was held in the summer of 2019 at ETH Zurich, Switzerland.

When: July 20th - July 21st, 2020 (8am-12pm Pacific Time)
Where: Toyota Technological Institute at Chicago (TTIC)
6045 S Kenwood Ave,
Chicago, IL 60637

The workshop will be held virtually.

Organizing Committee:
Register
All researchers interested in Local Algorithms are invited to attend. Attending the workshop is free, but please register via this form before the extended deadline of July 10th, 2020.
Spotlight Speakers
  • Robert Krauthgamer (The Weizmann Institute of Science)
  • Krzysztof Onak (IBM Research)
  • Merav Parter (The Weizmann Institute of Science)
  • Lisa Sauermann (MIT)
  • Madhu Sudan (Harvard University)
  • Gregory Valiant (Stanford University)
Short Talks
This year we will have short talks including a 15-20 minutes recorded talk, and a 5 minutes live presentation, for postdocs and graduate students. Below is the list of short talk speakers.
  • Maryam Aliakbarpour (MIT)
  • Ainesh Bakshi (Carnegie Mellon University)
  • Soheil Behnezhad (University of Maryland)
  • Sourbh Bhadane (Cornell University)
  • Amartya Shankha Biswas (MIT)
  • Sebastian Brandt (ETH Zurich)
  • Clèment Canonne (IBM Research)
  • Sitan Chen (MIT)
  • Marcel Dall'Agnol (University of Warwick)
  • Talya Eden (MIT)
  • Nimrod Fiat (Tel Aviv University)
  • Yanjun Han (Stanford University)
  • Rajesh Jayaram (Carnegie Mellon University)
  • Reut Levi (IDC)
  • Quanquan Liu (MIT)
  • Raj Kumar Maity (University of Massachusetts, Amherst)
  • Lazar Milenkovic (Tel Aviv Univerity)
  • Yasamin Nazari (Johns Hopkins University)
  • Asaf Rosin (Tel-Aviv University)
  • Ziteng Sun (Cornell Univerity)
  • Jakab Tardos (EPFL)
  • Jonathan Tidor (MIT)
  • Ali Vakilian (University of Wisconsin-Madison)
  • Nithin Varma (University of Haifa)
  • Pavel Vesely (University of Warwick)
  • Erik Waingarten (Columbia Univerity)
Attending*
If you are attending the workshop, you can make use of the following virtual recourses. The corrensponding information will be emailed to you once you register.
YouTube Channel: The videos of the short talks are now uploaded on the YouTube Channel of the workshop.
Slack Channel: The slack channel can be used for asking questions from our short talk speakers as well as all other related discussions.
Junior-Senior Milk&Cookies: We will have a few Junior-Senior socializing events right before and right after the workshop on each day.
Freestyle Meetings: Attendees can themselves organize zoom sessions to discuss any topic, which they can organize via a googledoc.

(*We thank STOC 2020 organizers for the ideas about the above format.)
Tentative Schedule
All times in the schedule below are shown in PST.

Day 1, Monday July 20th

Session 1 (Chair: Rocco Servedio)

8:00 -- 8:45 Madhu Sudan Communication Amid Uncertainty

I will talk about a sequence of works where we investigate basic questions in communication where unreliability can be attributed, not to the channel of communication, but to the lack of perfect synchronization between the communicating players. Topics that will be covered include (some subset of):

1) (One-shot) Compression where sender and receiver of information do not agree on the distribution from which the message is sampled. Can we compress information down to the entropy?

2) Communication complexity when Alice and Bob don't have perfectly shared randomness: Is having shared correlation as good as having perfectly shared randomness?

3) Communication complexity when Alice and Bob don't agree on the function being computed. Can one salvage low-communication protocols with such uncertainty?

4) Communication of proofs (as in our conference/journal papers) where prover and verifier don't agree on what is basic math.

In most cases we have only partial results, and I will describe what we know and the many open questions.

Based on many joint works.

[video]

8:45 -- 8:50 Jonathan Tidor Testing linear-invariant properties

In this talk we briefly survey known results and techniques in graph property testing, building up to Alon and Shapira's classification of graph properties that are testable with constant query complexity and one-sided error by the oblivious tester. We then discuss recent progress on the analogous problem in arithmetic property testing, the new difficulties that appear in this setting, and the new techniques developed to overcome them. Based on joint work with Yufei Zhao.

[video]

8:50 -- 8:55 Marcel Dall'Agnol A structural theorem for local algorithms with applications to coding, testing and delegation

Motivated by the applicability of relaxed sunflower techniques to property testing and local decoding, we define a common framework that captures a wide family of local algorithms, encompassing those as well as proofs of proximity. This enables us to show that the structure of every algorithm that makes a constant number of adaptive queries and satisfies a natural robustness condition admits a sample-based algorithm with sublinear sample complexity; moreover, this transformation is nearly optimal.

The talk will give an overview of the general framework and the transformation from adaptive to sample-based algorithms, which relies heavily on relaxed sunflower lemmas and the Hajnal-Szemerédi theorem. Applying this transformation to particular cases, we obtain: (1) a strengthening of the state-of-the-art lower bound for relaxed locally decodable codes that improves the dependency in query complexity exponentially; (2) a transformation of any constant-query tester into a sample-based tester with sublinear sample complexity, that was previously known to hold only for non-adaptive testers; and (3) a limit on the power of sublinear-time delegation of computation, by showing almost tightness of the known separation between testers and proofs of proximity.

Joint work with Tom Gur and Oded Lachish.

[video]

8:55 -- 9:00 Yanjun Han Optimal Communication Rates and Combinatorial Properties for Distributed Simulation

We study the distributed simulation problem where n players aim to generate same sequences of random coin flips where some subsets of the players share an independent common coin which can be tossed multiple times, and there is a publicly seen blackboard through which the players communicate with each other. We provide a tight representation of the optimal communication rates via linear programming, and more importantly, propose explicit algorithms for the optimal distributed simulation for a wide class of hypergraphs. In particular, the optimal communication rate in complete hypergraphs is still achievable in sparser hypergraphs containing a path-connected cycle-free cluster of topologically connected components.

Some key steps in analyzing the upper bounds rely on two different definitions of connectivity in hypergraphs, which may be of independent interest.

[video]

9:00 -- 9:05 Ziteng Sun Share more and talk less: Optimal Distributed Testing under Randomness and Communication Constraints

We study goodness-of-fit testing of discrete distributions in the distributed setting where each user holds a sample and can only measure or release it under some local information constraints (local measurements, limited communication, local privacy, etc.). Recent works show that having access to a large enough common random seeds (public randomness) leads to a significant reduction in the sample complexity. In this work, we provide a complete understanding of the trade-off when only a limited amount of public randomness is available, by characterizing the optimal sample complexity in terms of the length of the random seed and all other parameters.

Joint work with Jayadev Acharya (Cornell University), Clèment L. Canonne (IBM Research), Yanjun Han (Stanford University), and Himanshu Tyagi (IISc Bangalore).

[video]

9:05 -- 9:25 Break

Session 2 (Chair: Ronitt Rubinfeld)

9:25 -- 10:10 Merav Parter Massively Parallel Derandomization for Local Graph Problems

The Massively Parallel Computation (MPC) model is an emerging model which distills core aspects of distributed and parallel computation. It has been developed as a tool to solve (typically graph) problems in systems where the input is distributed over many machines with limited space.

Over the past decade, there has been a growing interest to understand these settings from a local algorithmic perspective. This has led to a large body of work providing fast algorithms, usually with sublogarithmic-time and even faster, for a number of fundamental local graph problems. A common feature of most of these results is that they were relying on randomized algorithms, while very little research has been done to study deterministic algorithms.

In this talk, I will discuss a recent line of work that investigates the curious gap between randomized and deterministic solutions in massively parallel models. I will present the graph sparsification technique that computes low-degree subgraphs with additional desired properties (that depend on the specific problem). This leads to efficient derandomization of the Luby's MIS and Maximal Matching algorithms. I will also present a simple deterministic algorithm for the $(\Delta+1)$ coloring problem.

If time allows, we will discuss some new derandomization theorems for the hitting set problem that are based on PRGs for fooling read-once DNF and CNF formulas. The latter finds applications in spanner and shortest path algorithms. Finally, I will describe some barriers for perfect derandomization (i.e., which matches the bounds of the randomized counterparts) and highlight some key open problems.

Based on several joint works with Keren Censor-Hillel, Artur Czumaj, Peter Davies, Michal Dory, Gregory Schwartzman, and Eylon Yogev.

[video]

10:10 -- 10:15 Sebastian Brandt Lower Bounds for Ruling Sets in the LOCAL Model

Given a graph $G = (V,E)$, an $(\alpha, \beta)$-ruling set is a subset $S \subseteq V$ such that the distance between any two vertices in $S$ is at least $\alpha$, and the distance between any vertex in $V$ and the closest vertex in $S$ is at most $\beta$. Ruling sets are a generalization of maximal independent sets (which are $(2,1)$-ruling sets) and constitute an important building block of many distributed algorithms. The recent breakthrough on network decompositions by Rozho\v n and Ghaffari [STOC'20] implies that, in the LOCAL model, ruling sets can be computed deterministically in polylogarithmic time, for a wide range of parameters $\alpha, \beta$.

In this talk, we present polylogarithmic lower bounds for the deterministic computation of ruling sets, and $\Omega(\poly(\log \log n))$ lower bounds for the randomized computation of ruling sets, improving on the previously best known lower bounds of $\Omega(\log^* n)$ by Linial [FOCS'87] and Naor [J.Disc.Math.'91].

In the special case of maximal independent sets, such lower bounds were already known; however, our lower bounds are the first (beyond $\Omega(\log^* n)$) that are also applicable on trees.

[video]

10:15 -- 10:20 Yasamin Nazari Non-local Distances via Local Techniques

One important tool widely used for solving local problems is a certain type of clustering called a neighborhood cover. In this talk we describe how neighborhood covers can be used for solving global distance problems. We can apply these tools in computing approximate single-source or multi-source distances in distributed models, and in constructing distance related objects such as hopsets.

[video]

10:20 -- 10:25 Quanquan Liu Parallel Algorithms for Small Subgraph Counting

With the prevalence of large graphs, it is becoming increasingly important to design scalable algorithms. Over the last two decades, frameworks for parallel computation, such as MapReduce, Hadoop, Spark and Dryad, gained significant popularity. The Massively Parallel Computation (MPC) model is a de-facto standard for studying these frameworks theoretically. Subgraph counting is a fundamental problem in analyzing massive graphs, often studied in the context of social and complex networks. There is a rich literature on designing scalable algorithms for this problem, with the main challenge to design methods which are both efficient and accurate. In this work, we tackle this challenge and design several new algorithms for subgraph counting in MPC.

Given a graph G over n vertices, m edges and T triangles, our first main result is an algorithm that, with high probability, outputs a (1 + ε)-approximation to T, with asymptotically optimal round and total space complexity provided any S ≥ max (\sqrt(m), n^2/m) space per machine and assuming T = Ω(\sqrt(m/n)).

Our second main result is an O(log log n)-rounds algorithm for exactly counting the number of triangles, parametrized by the arboricity α of the input graph. The space per machine is O(n^δ) for any constant δ, and the total space is O(mδ), which matches the time complexity of (combinatorial) triangle counting in the sequential model. We also prove that this result can be extended to exactly counting k-cliques for any constant k, with the same round complexity and total space O(mα^(k−2)). Alternatively, allowing O(α^2) space per machine, the total space requirement reduces to O(nα^2).

Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most 5, can be implemented in the MPC model in O(\sqrt{log n}) rounds, O(n^δ) space per machine and O(mα^3) total space. Therefore, this result also exhibits the phenomenon that a time bound in the sequential model translates to a space bound in the MPC model.

[video]

10:25 -- 10:30 Amartya Shankha Biswas Faster parallel algorithms for distance approximation and spanners

Over the past decade, there has been increasing interest in distributed/parallel algorithms for processing large-scale graphs. By now, we have quite fast algorithms---usually sub-logarithimic-time and often poly(log log n)-time, or even faster---for a number of fundamental graph problems in the massively parallel computation (MPC) model. This model is a widely-adopted theoretical abstraction of MapReduce style settings, where a number of machines communicate in an all-to-all manner to process large-scale data. Contributing to this line of work on MPC graph algorithms, we present poly(log k)∈poly(log log n) round MPC algorithms for computing O(k^{1+o(1)})-spanners in the strongly sublinear regime of local memory. One important consequence, by letting k = log n, is a O(log^2 log n)-round algorithm for O(log^{1+o(1)} n) approximation of all pairs shortest path (APSP) in the near-linear regime of local memory. To the best of our knowledge, these are the first sublogarithmic-time MPC algorithms for computing spanners and distance approximation.

[video]

10:30 -- 10:50 Break

Session 3 (Chair: Sepideh Mahabadi)

10:50 -- 11:35 Robert Krauthgamer Coresets for Clustering in Graphs and Beyond

A coreset is a technique to reduce data size while maintaining a (1+epsilon)-approximation to some objective function, like k-median clustering. Prior work focused on Euclidean data, e.g., coresets for k-median clustering of n points in R^d, culminating recently with the impressive feat of coresets of size poly(k/epsilon), independently of the Euclidean dimension d.

I will introduce coresets for k-median in a graph metric, i.e., the shortest-path metric of an edge-weighted graph G, which arises for example in road networks and data visualization. I will then present several bounds that depend on the topology of G, e.g., its treewidth, planarity, or excluded-minor. We will see how to significantly extend techniques known from the Euclidean setting, advancing them in surprising new directions. As a result, we can also match state-of-the-art bounds for the Euclidean setting using a simple construction.

Joint work with Daniel Baker, Vladimir Braverman, Lingxiao Huang, Shaofeng H.-C. Jiang, and Xuan Wu (in two recent papers).

[video]

11:35 -- 11:40 Pavel Vesely Relative Error Streaming Quantiles

Approximating ranks, quantiles, and distributions over streaming data is a central task in data analysis and monitoring. Given a stream of n items, the task is to compute a sketch (data structure) of size poly(log(n), 1/epsilon) that can return an epsilon-approximate rank of any query item y, i.e., the number of stream elements no larger than y.

Most works to date focused on additive error approximation, culminating in the KLL sketch that achieved optimal asymptotic behavior. This paper investigates multiplicative error approximations to the rank. The motivation stems from practical demand to understand the tails of distributions, and hence for sketches to be more accurate near extreme values. We present a randomized sketch for the multiplicative error guarantee, with a close-to-optimal size.

Joint work with G. Cormode, Z. Karnin, E. Liberty, and J. Thaler.

[video]

11:40 -- 11:45 Ainesh Bakshi Sample-Optimal Algorithms for PSD Low-Rank Approximation

Recently, Musco and Woodruff (FOCS, 2017) showed that given an n \times n positive semidefinite (PSD) matrix $\mathbf{A}$, it is possible to compute a $(1+\epsilon)$-approximate relative-error low-rank approximation to $\mathbf{A}$ by querying $\widetilde{O}(nk/\epsilon^{2.5})$ entries of $\mathbf{A}$ in time $\widetilde{O}(nk/\epsilon^{2.5} +n k^{\omega-1}/\epsilon^{2(\omega-1)})$. They also showed that any relative-error low-rank approximation algorithm must query $\widetilde{\Omega}(nk/\epsilon)$ entries of $\mathbb{A}$, this gap has since remained open. Our main result is to resolve this question by obtaining an optimal algorithm that queries $\widetilde{O}(nk/\epsilon)$ entries of $\mathbb{A}$ and outputs a relative-error low-rank approximation in $\widetilde{O}(n\cdot(k/\epsilon)^{\omega-1})$ time. Note, our running time also improves that of Musco and Woodruff, and matches the information-theoretic lower bound if the matrix-multiplication exponent $\omega$ is 2.

Our techniques extend to obtain sample-optimal algorithms for computing a low-rank approximation of Negative-type distance matrices and corsets for ridge regression.

This talk is based on joint work with Nadiia Chepurko and David Woodruff.

[video]

11:45 -- 11:50 Raj Kumar Maity Estimation of Shortest Path Covariance Matrices

We study the sample complexity of estimating the covariance matrix Σ∈R^{d×d} of a distribution D over R^d given independent samples, under the assumption that Σ is graph-structured. In particular, we focus on shortest path covariance matrices, where the covariance between any two measurements is determined by the shortest path distance in an underlying graph with d nodes. Such matrices generalize Toeplitz and circulant covariance matrices and are widely applied in signal processing applications, where the covariance between two measurements depends on the(shortest path) distance between them in time or space.We focus on minimizing both the vector sample complexity: the number of samples drawn from D and the entry sample complexity: the number of entries read in each sample. The entry sample complexity corresponds to measurement equipment costs in signal processing applications. We show how to estimateΣup to spectral norm error ||Σ||_2 using just O(\sqrt{D}) entry sample complexity and Õ(d2/2) vector sample complexity, where D is the diameter of the underlying graph. Our method is based on extending the widely applied idea of sparse rulers for reducing the entry sample complexity of Toeplitz covariance estimation to the graph setting. In particular, we show how to find a subset of just O(\sqrt{D}) nodes of the underlying graph whose shortest path distances include all distances in the graph. Measurements at this small subset of nodes can be used to estimate the full covariance matrix Σ Beyond our main result, we give an improved bound for low-rank Σ which matches the state-of-the-art for Toeplitz covariance estimation, while using a much more general proof. We also give an information theoretic lower bound matching our upper bound up to a factor D. We leave solving the gap between these bounds as an interesting open question.

[video]

11:50 -- 11:55 Rajesh Jayaram Testing Positive Semi-Definiteness via Random Submatrices

We consider the problem of testing whether a matrix $A \in \R^{n \times n}$ with bounded entries ($\|A\|_\infty \leq 1$) is positive semi-definite (PSD), or $\eps$-far from the PSD cone, i.e. $\min_{B \succeq 0} \|A - B\|_F^2 = \sum_{i : \lambda_i(A) < 0} \lambda_i^2(A) > \eps n^2$. Testing whether a given matrix is PSD is a crucial component of many computational tasks, such as checking feasibility of Semidefinite Programs, testing metric embeddability, testing convexity in optimization, and many others. Our main algorithmic contribution is a non-adaptive tester which distinguishes between these cases using only $\wt{O}(1/\eps^4)$ queries to the entries of $A$. For the related ``$\ell_\infty$-gap problem'', where $A$ is either PSD or has an eigenvalue satisfying $\lambda_i(A)\le - \eps n$, our algorithm only requires $\wt{O}(1/\eps^2)$ queries, which is optimal up to $\log(1/\eps)$ factors. Our algorithms randomly sample a collection of principle sub-matrices and check whether these sub-matrices are PSD. Since all principle submatrices of PSD matrices are PSD, our results can be seen as a statement on how well matrices which are far from PSD can be locally tested.

We complement our main upper bound for PSD testing with $\ell_2^2$-gap by giving a $\wt{\Omega}(1/\eps^2)$ lower bound for any non-adaptive algorithm. Our lower bound construction is general, and can be used to derive new lower bounds for a number of spectral testing problems, such as for Schatten and Ky-Fan norms, and may additionally be useful for proving lower bounds for other graph property testing problems.

Joint work with Ainesh Bakshi (CMU) and Nadiia Chepurko (MIT).

[video]

11:55 -- 12:00 Reut Levi Testing Triangle Freeness in the General Model in Graphs with Arboricity $O(\sqrt{n})$

We study the problem of testing triangle freeness in the general model. This problem was first studied in the general model by Alon et al. (SIAM J. Discret. Math. 2008) who provided both lower bounds and upper bounds that depend on the number of vertices and the average degree of the graph. Their bounds are tight only when $d_{\rm max} = O(d)$ and $\bar{d} \leq \sqrt{n}$ or when $d = \Theta(1)$, where $d_{\rm max}$ denotes the maximum degree and $\bar{d}$ denotes the average degree of the graph.

In this paper we provide bounds that depend on the arboricity of the graph and the average degree. As in Alon et al., the parameters of our tester is the number of vertices, $n$, the number of edges, $m$, and the proximity parameter $\epsilon$ (the arboricity of the graph is not a parameter of the algorithm). The query complexity of our tester is $O(\alpha/\bar{d} + \alpha)\cdot \poly(1/\eps)$ where $\alpha$ denotes the arboricity of the graph. We show that for graphs with arboricity $O(\sqrt{n})$ this upper bound is tight in the following sense. For any $\alpha \in [s]$ where $s= \Theta(\sqrt{n})$ there exists a family of graphs with arboricity $\alpha$ and average degree $\bar{d}$ such that $\Omega(\alpha/\bar{d} + \alpha)$ queries are required for testing triangle freeness on this family of graphs. Moreover, this lower bound holds for any such $\alpha$ for a large range of feasible average degrees (recall that if the arboricity of a graph, $G$, is $\alpha$ then the number of edges of $G$ is at most $n \cdot \alpha$ and at least $\alpha^2$).

[video]


Day 2, Tuesday July 21st

Session 4 (Chair: Jacob Fox)

8:00 -- 8:45 Lisa Sauermann Arithmetic cycle removal lemmas and applications in property testing

Ben Green proved an arithmetic k-cycle removal lemma, which when applied to \mathbb{F}_2^n roughly speaking states the following: For any subsets X_1, ..., X_k of \mathbb{F}_2^n, either there are a lot of solutions to the equation x_1+...+x_k = 0 with (x_1,...,x_k) \in X_1\times ... \times X_k or one can delete a small number of elements from each of the sets X_1, ..., X_k to destroy all such solutions. This result has important consequences in property testing of Boolean functions. However, Green's proof gives a tower-type dependence between the parameters in his result, which means that for the resulting property testing algorithms the query complexity grows extremely quickly as a function of the distance parameter epsilon. The problem of improving the bounds between the parameters in Green's arithmetic k-cycle removal lemma for \mathbb{F}_2^n has received a lot of attention. In the case k = 3, which corresponds to testing triangle-freeness of Boolean functions, Jacob Fox and László Miklós Lovász proved that the dependence between the parameters is in fact polynomial. In joint work with Jacob Fox and László Miklós Lovász, we extended their result to all k > 3.

[video]

8:45 -- 8:50 Sitan Chen Entanglement is Necessary for Optimal Quantum Property Testing

There has been a surge of progress in recent years in developing algorithms for testing and learning quantum states that achieve optimal copy complexity. Unfortunately, they require the use of entangled measurements across many copies of the underlying state and thus remain outside the realm of what is currently experimentally feasible. A natural question is whether one can match the copy complexity of such algorithms using only independent---but possibly adaptively chosen---measurements on individual copies.

We answer this in the negative for arguably the most basic quantum testing problem: deciding whether a given d-dimensional quantum state is equal to or ε-far in trace distance from the maximally mixed state. While it is known how to achieve optimal O(d/ε^2) copy complexity using entangled measurements, we show that with independent measurements, Ω(d^{4/3}/ε^2) is necessary, even if the measurements are chosen adaptively. This resolves a question of Wright. To obtain this lower bound, we develop several new techniques, including a chain-rule style proof of Paninski's lower bound for classical uniformity testing, which may be of independent interest.

[video]

8:50 -- 8:55 Asaf Rosin Optimal Distribution-Free Sample-Based Testing of Subsequence-Freeness

In this work, we study the problem of testing subsequence-freeness. For a given subsequence (word) $w = w_1 ... w_k$, a sequence (text) $T = t_1 ... t_n$ is said to contain $w$ if there exist indices $1 \leq i_1 \leq ... \leq i_k \leq n$ such that $t_{i_{j}} = w_j$ for every $1 \leq j \leq k$. Otherwise, $T$ is $w$-free. While a large majority of the research in property testing deals with algorithms that perform queries, here we consider sample-based testing (with one-sided error). In the ``standard'' sample-based model (i.e., under the uniform distribution), the algorithm is given samples $(i,t_i)$ where $i$ is distributed uniformly independently at random. The algorithm should distinguish between the case that $T$ is $w$-free, and the case that $T$ is $\epsilon$-far from being $w$-free (i.e., more than an $\epsilon$-fraction of its symbols should be modified so as to make it $w$-free). Freitag, Price, and Swartworth (Proceedings of RANDOM, 2017) showed that $O(k^2\log k/\epsilon)$ samples suffice for this testing task.

We obtain the following results.

  • The number of samples sufficient for sample-based testing (under the uniform distribution) is $O(k/\epsilon)$. This upper bound builds on a characterization that we present for the distance of a text $T$ from $w$-freeness in terms of the maximum number of copies of $w$ in $T$, where these copies should obey certain restrictions.
  • We prove a matching lower bound which implies that the above upper bound is tight. Moreover, the lower bound holds not only for the worst case choice of $w$, but rather for *every* $w$.
  • The same upper bound holds in the more general distribution-free sample-based model. In this model the algorithm receives samples $(i,t_i)$ where $i$ is distributed according to an arbitrary unknown distribution $p$ (and the distance from $w$-freeness is measured with respect to $p$). We prove this result by a reduction to sample-based testing under the uniform distribution.

[video]

8:55 -- 9:00 Nimrod Fiat Efficient Distance Approximation for Graph Properties

A \emph{distance-approximation\/} algorithm for a graph property $\calP$ in the adjacency-matrix model is given an approximation parameter $\eps \in (0,1)$ and query access to the adjacency matrix of a graph $G=(V,E)$. It is required to output an estimate of the \emph{distance} between $G$ and the closest graph $G'=(V,E')$ that satisfies $\calP$, where the distance between graphs is the size of the symmetric difference between their edge sets, normalized by $|V|^2$.

In this work we introduce \emph{property covers}, as a framework for using distance-approximation algorithms for ``simple'' properties to design distance-approximation algorithms for more ``complex'' properties. Applying this framework we present distance-approximation algorithms with $\poly(1/\eps)$ query complexity for induced $P_3$-freeness, induced $P_4$-freeness, and Chordality. For induced $C_4$-freeness our algorithm has query complexity $\exp(\poly(1/\eps))$. These complexities essentially match the corresponding known results for testing these properties and provide an exponential improvement on previously known results.

[video]

9:00 -- 9:05 Nithin Varma Average sensitivity of graph algorithms

In modern applications of graph algorithms, where the graphs of interest are large and dynamic, it is unrealistic to assume that an input representation contains the full information of a graph being studied. For example, consider a social network, where a vertex corresponds to a user and an edge corresponds to a friendship relation. It is reasonable to assume that users do not always update new friendships on the social network, and that sometimes they do not fully disclose their friendship relations for security or privacy reasons. This motivates the design of graph algorithms that, in spite of being given only a (large) subgraph as input, output solutions that are close to the solutions output when the whole graph is available.

In this talk, I will introduce a notion of sensitivity of graph algorithms that formalizes this desirable feature. After describing the basic properties of our sensitivity definition, I will mention some of the key ideas that we use in the design of algorithms with low sensitivity for problems such as global mincut, s-t mincut, and maximum matching. The highlight of the talk will be our result which states that the existence of a local computation algorithm for a problem implies the existence of an algorithm with low sensitivity for the same problem. I will discuss how we use this result to show a linear lower bound on the query complexity of every local computation algorithm for 2-coloring.

[video]

9:05 -- 9:25 Break

Session 5 (Chair: Christian Sohler)

9:25 -- 10:10 Gregory Valiant Randomly Collected, Worst Case Data

I'll discuss a new framework for statistical estimation that leverages knowledge of how samples are collected but makes no distributional assumptions on the data values. Specifically, consider a population of elements 1,..,n with corresponding data values x1,..,xn. After observing the values indexed by a sample subset of indices, A, the goal will be to estimate some statistic of the entire set x1,..,xn. We make no assumptions on the values x1,...,xn, and instead assume that the sample indices are drawn according to a known distribution, P over subsets of 1,..,n. How can the distribution, P, be leveraged to minimize the worst-case expected error of the estimator, where the expectation is with respect to P and the worst-case is with respect to the data values x1,..,xn? For which distributions, P, is this error small? Within this general framework we give an efficient near-optimal algorithm for mean estimation, leveraging a surprising connection to the Grothendieck problem. We also discuss this framework in several specific settings where membership in a sample may be correlated with data values, such as when probabilities of sampling vary as in ``importance sampling'', when individuals are recruited into a sample through their social networks as in ``snowball/chain sampling'' or when samples have chronological structure as in ``selective prediction''. This talk is based on joint works with Mingda Qiao, and with Justin Chen and Paul Valiant.

[video]

10:10 -- 10:15 Clèment Canonne Distributed Signal Detection under Communication Constraints

Observations from a d-dimensional Gaussian are distributed across n users, each holding one sample. A central server seeks to distinguish between two cases: the Gaussian has zero mean, or the mean has norm at least ε. However, the users can each transmit at most ℓ bits to the server or, equivalently, can only make a partial local measurement on their sample.

This is the problem of detecting whether an observed signal is simply white noise in a distributed setting. We study this distributed testing problem with and without the availability of public randomness, providing protocols, showing lower bounds, establishing connections, and making conjectures.

Joint work with Jayadev Acharya and Himanshu Tyagi.

[video]

10:15 -- 10:20 Maryam Aliakbarpour Testing Properties of Multiple Distributions with Few Samples

In this talk, I present a new setting for testing properties of distributions while receiving samples from several distributions, but few samples per distribution. The primary motivation for considering this setting is that it captures data collection in the real world. I explain a brief description of our testers for the following problems in this setting when given samples from s distributions, p_1, p_2, . . . , p_s: (1) Uniformity Testing: Testing whether all the p_i's are uniform or eps-far from being uniform in \ell_1-distance (2) Identity Testing: Testing whether all the p_i's are equal to an explicitly given distribution q or eps-far from q in \ell_1-distance, and (3) Closeness Testing: Testing whether all the p_i's are equal to a distribution q which we have sample access to, or eps-far from q in \ell_1-distance. By assuming an additional natural condition about the source distributions, we provide sample optimal testers for all of these problems.

Joint work with Sandeep Silwal

[video]

10:20 -- 10:25 Sourbh Bhadane Estimating Entropy of Distributions in Constant Space

Estimating Shannon entropy H(p) of a distribution p over [k] to within an additive constant requires k/log k samples, which is optimal [ValiantV'10, WuY'16, JiaoHVW'16]. The sample-optimal methods, however, require nearly linear ~k/log k space for implementation. We propose an algorithm that requires a constant O(1) words of space and O(k) samples, providing a significant reduction in space and only a logarithmic blow-up in the number of samples.

Joint work with Jayadev Acharya, Piotr Indyk, and Ziteng Sun (NeurIPS 2019).

[video]

10:25 -- 10:30 Erik Waingarten Random Restrictions of High-Dimensional Distributions and Uniformity Testing with Subcube Conditioning

Given a distribution p supported on {-1,1}^n, we want to test whether p is uniform or ε-far from uniform in total variation distance. The fact that p is a high dimensional distribution is inconsequential for the sample complexity, which is well-known to be Θ(2^{n/2} / ε^2). To benefit from the high dimensional structure, we study the subcube conditional sampling model, first considered in Bhattacharyya and Chakraborty (2018), and give a nearly optimal algorithm for testing uniformity on {-1,1}^n making Õ(\sqrt{n} / ε^2) queries. The key ingredient is a natural notion of random restriction for distributions on {-1,1}^n, and an inequality describing the behavior of the mean vector after a random restriction.

Joint work with Clement Canonne, Xi Chen, Gautam Kamath and Amit Levi.

[video]

10:30 -- 10:50 Break

Session 6 (Chair: Mohsen Ghaffari)

10:50 -- 11:35 Krzysztof Onak The Current Landscape of Massively Parallel Algorithms for Graphs

One of the most interesting settings for massively parallel algorithms on graphs is when the local space of each machine is sublinear in the number of vertices. This allows for distributing computation, even for sparse graphs, across a large number of machines that do not have to be very powerful. In recent years, we have seen several developments in this area, including both algorithms and conditional lower bounds. In many cases these results leverage connections to other areas such as distributed algorithms and property testing. I will survey some of them and talk in more detail about generating random walks and estimating PageRank, for which we were able to obtain nearly-exponential improvements in round complexity over direct implementations of known algorithms. Finally, I will share some of my favorite open questions in the area.

[video]

11:35 -- 11:40 Ali Vakilian Improved Local Computation Algorithm for Set Cover via Sparsification

We design a Local Computation Algorithm (LCA) for the set cover problem. Given a set system where each set has size at most s and each element is contained in at most t sets, the algorithm reports whether a given set is in some fixed set cover whose expected size is O(log s) times the minimum fractional set cover value. Our algorithm requires s^{O(log s)} t^{O(log s (log log s + log log t))} queries. This result improves upon the application of the reduction of [Parnas and Ron, TCS'07] on the result of [Kuhn et al., SODA'06], which leads to a query complexity of (st)^O(log s log t).

To obtain this result, we design a parallel set cover algorithm that admits an efficient simulation in the LCA model by using a sparsification technique introduced in [Ghaffari and Uitto, SODA'19] for the maximal independent set problem. The parallel algorithm adds a random subset of the sets to the solution in a style similar to the PRAM algorithm of [Berger et al., FOCS'89]. However, our algorithm differs in the way that it never revokes its decisions, which results in a fewer number of adaptive rounds. This requires a novel approximation analysis which might be of independent interest.

Joint work with Christoph Grunau, Slobodan Mitrovic and Ronitt Rubinfeld.

[video]

11:40 -- 11:45 Talya Eden Faster sublinear approximations of k-cliques for low arboricity graphs

We present a sublinear-time algorithm for approximating the number of $k$-cliques in a graph, when a bound $\alpha$ on the arboricity is given. We consider the standard query model for general graphs allows for degree queries, neighbor queries, and pair queries. Let n be the number of vertices, m be the number of edges, and n_k be the number of k-cliques. Previous work by Eden, Ron and Seshadhri (STOC 2018) gives an O*(n/n_K^{1/k} + m^{k/2}/n_k)-time algorithm for this problem (we use O*(.) to suppress $\poly(\log n, 1/\epsilon, k^k)$ dependencies). Moreover, this bound is nearly optimal when the expression is sublinear in the size of the graph. Our motivation is to circumvent this lower bound, by parameterizing the complexity in terms of \emph{graph arboricity}. The arboricity of G is a measure for the graph density "everywhere". We design an algorithm for the class of graphs with arboricity at most α, whose running time is O*(min{nα^{k-1}/n_k,n/n_k^{1/k}}+mα^{k-2}/n_k). We also prove a nearly matching lower bound. For all graphs, the arboricity is O(\sqrt m), so this bound subsumes all previous results on sublinear clique approximation.

As a special case of interest, consider minor-closed families of graphs, which have constant arboricity. Our result implies that for any minor-closed family of graphs, there is a (1±ε)-approximation algorithm for n_k that has running time O*(n/n_k). Such a bound was not known even for the special (classic) case of triangle counting in planar graphs.

[video]

11:45 -- 11:50 Soheil Behnezhad Locality and the Stochastic Matching Problem

Consider an arbitrary graph G=(V, E) and suppose that each edge e in E is realized independently with some probability p. We consider a well-studied stochastic matching problem where the goal is to pick a sparse subgraph Q of G such that the realized edges in Q, in expectation, include a matching that is approximately as large as the maximum matching among the realized edges of G.

We draw a novel connection between LOCAL algorithms and this stochastic matching problem. Using this connection, we analyze a natural sampling based algorithm for the stochastic matching problem showing that it achieves a (1-\epsilon) approximation, for any desirably small constant \epsilon > 0. Previously, the best known approximation factor for this problem was 2/3.

Based on a joint work with Mahsa Derakhshan and Mohammad Hajiaghayi (STOC'20)

[video]

11:50 -- 11:55 Jakab Tardos Approximate maximum matchings in LCA and matching size in streaming

Given a source of iid samples of edges of an input graph G with n vertices and m edges, how many samples does one need to compute a constant factor approximation to the maximum matching size in G? Moreover, is it possible to obtain such an estimate in a small amount of space? We show that, on the one hand, this problem cannot be solved using a nontrivially sublinear (in m) number of samples: m1 - o(1) samples are needed. On the other hand, a surprisingly space efficient algorithm for processing the samples exists: O(log2n) bits of space suffice to compute an estimate.

Our main technical tool is a new peeling type algorithm for matching that we simulate using a recursive sampling process that crucially ensures that local neighborhood information from `dense' regions of the graph is provided at appropriately higher sampling rates. We show that a delicate balance between exploration depth and sampling rate allows our simulation to not lose precision over a logarithmic number of levels of recursion and achieve a constant factor approximation. The previous best result on matching size estimation from random samples was a logO(1)n approximation [Kapralov et al'14].

Our algorithm also yields a constant factor approximate local computation algorithm (LCA) for matching with O(dlogn) exploration starting from any vertex. Previous approaches were based on local simulations of randomized greedy, which take O(d) time {\em in expectation over the starting vertex or edge} (Yoshida et al'09, Onak et al'12), and could not achieve a better than d2 runtime. Interestingly, we also show that unlike our algorithm, the local simulation of randomized greedy that is the basis of the most efficient prior results does take Ω(d2) time for a worst case edge.

[video]

11:55 -- 12:00 Lazar Milenkovic A Unified Sparsification Approach for Matching Problems in Graphs of Bounded Neighborhood Independence

The neighborhood independence number of a graph G, denoted by beta = beta(G), is the size of the largest independent set in the neighborhood of any vertex. Graphs with bounded neighborhood independence, already for constant beta, constitute a wide family of possibly dense graphs, including line graphs, unit-disk graphs, claw-free graphs and graphs of bounded growth, which has been well-studied in the area of distributed computing. In ICALP'19, Assadi and Solomon [AS19] showed that, for any n-vertex graph G, a maximal matching can be computed in O(beta n log n) time in the classic sequential setting. This result shows that, surprisingly, for almost the entire regime of parameter beta, a maximal matching can be computed much faster than reading the entire input. The algorithm of [AS19], however, is inherently sequential and centralized. Moreover, a maximal matching provides a 2-approximate (maximum) matching, and the question of whether a better-than-2-approximate matching can be computed in sublinear time remained open.

In this talk, I will present a unified and surprisingly simple approach for producing (1+eps)-approximate matchings, for arbitrarily small eps>0. Specifically, set Delta = O((beta/eps) log(1/eps)) and let G_Delta be a random subgraph of G that includes, for each vertex v in G, Delta random edges incident on it. We show that, with high probability, G_Delta is a (1+eps)-matching sparsifier for G, i.e., the maximum matching size of G_Delta is within a factor of 1+eps from that of G. One can then work on the sparsifier G_Delta rather than on the original graph G. Since G_Delta can be implemented efficiently in various settings, this approach is of broad potential applicability.

The first implication of our approach is that a (1+eps)-approximate matching can be computed in the classic sequential setting in O((n beta / eps^2) log(1/eps) time, shaving a log n factor from the runtime of [AS19] and achieving an approximation factor of 1+eps rather than 2. For constant eps, our runtime is tight. If time permits, I will discuss the applications of our approach in other settings.

Joint work with Shay Solomon (SPAA'20).

[video]