# Algorithms & Complexity Seminar, MIT : 2017-18

Organizers: Akshay Degwekar (), Pritish Kamath (), Govind Ramnarayan ()

The Algorithms & Complexity Seminar for the 2017-18 year will usually (unless otherwise stated) meet on Wednesdays 4pm-5pm in 32-G575 (Theory Lab on the 5th floor of the Stata Center). The style and format of these meetings are variable. Please feel free to contact the organizers to find out more details. To receive announcements of upcoming talks, subscribe to the mailing list by either visiting the mailman page or send an empty email to compalgsem-subscribe@lists.csail.mit.edu.

## Upcoming Talks

• Wednesday, January 17, 2018: Keerti Choudhary (Weizmann).
Topic. Fault Tolerant Data Structures
Abstract. In this talk, we look at the problem of single-source-reachability (SSR) in presence of failures of vertices and edges. In the static setting, it can be solved in $O(|V|+|E|)$ time, for any directed graph $G = (V, E)$. To model the scenario of a faulty network, we associate a parameter $k$ with the network such that there are at most $k$ vertices (or edges) that are failed at any stage. The goal is to preprocess the graph $G$ and compute a compact data structure, that, given any set $F$ of at most $k$ vertices (or edges), efficiently solves the SSR problem for the graph $G \setminus F$. We show that for any set $F$ of size $k$, solves SSR problem for $G \setminus F$ in $O(2^k n)$ time. Previously the only known construction was single fault. One major application of this structure is a fault tolerant algorithm for SSC (strongly connected components).

## Past Talks

• Wednesday, September 6, 2017: Dor Minzer (Tel Aviv University).
Topic. An approach for 2-to-1 Games Conjecture via expansion on the Grassmann Graph
Abstract. The PCP theorem characterizes the computational class NP, so as to allow proving approximation problems are NP-hard.

One of the fundamental open questions in PCP theory is whether a special type of PCP, namely 2-to-1 Games, is still NP-hard. This conjecture is a variant of Khot's well-known Unique Games conjecture.

A recent line of study suggests a new attack on the 2-to-1 games conjecture (albeit with imperfect completeness). This approach is based on a mathematical object called the Grassmann Graph, and relies on an unproven combinatorial hypothesis on it. This, in turn, leads to the study of edge expansion in this graph. More specifically, to the study of sets of vertices in the Grassmann Graph, whose edge expansion is not optimal.

It turns out that the study of the combinatiral hypothesis is in fact equivalent to the study of edge expansion (Barak Kothari). Thus extending these results would complete the approach, proving that distinguishing between 2-to-1 games with close to 1 value, and 2-to-1 games with arbitrarily small value, is NP-hard.

This talk discusses this line of study.

Based on joint works with Irit Dinur, Subhash Khot, Guy Kindler and Muli Safra.
• Wednesday, September 27, 2017: Paul Hand (Rice U.). [Room: 32-D463 (Star)]
Topic. Deep Compressed Sensing
Abstract. Combining principles of compressed sensing with deep neural network-based generative image priors has recently been empirically shown to require 10X fewer measurements than traditional compressed sensing in certain scenarios. As deep generative priors (such as those obtained via generative adversarial training) improve, analogous improvements in the performance of compressed sensing and other inverse problems may be realized across the imaging sciences.

In joint work with Vladislav Voroninski, we provide a theoretical framework for studying inverse problems subject to deep generative priors. In particular, we prove that with high probability, the non-convex empirical risk objective for enforcing random deep generative priors subject to compressive random linear observations of the last layer of the generator has no spurious local minima, and that for a fixed network depth, these guarantees hold at order-optimal sample complexity.
• Thursday, September 28, 2017: Yuval Dagan (Technion). [Time: 3-4pm]
Topic. Trading information complexity for error
Abstract. We consider the standard two-party communication model. The central problem studied in this article is how much one can save in information complexity by allowing an error. Our major result is that the $\varepsilon$-error randomized communication complexity of set disjointness is $n[C_{\mathrm{DISJ}} - \Theta(h(\varepsilon))] + o(n)$, where $C_{\mathrm{DISJ}} \approx 0.4827$ is the constant of set disjointness found by Braverman et al.

Joint work with Yuval Filmus, Hamed Hatami and Yaqiao Li.
• Wednesday, October 11, 2017: Lijie Chen (MIT).
Topic. On The Power of Statistical Zero Knowledge
Abstract. We examine the power of statistical zero knowledge proofs (captured by the complexity class $\mathsf{SZK}$) and their variants. First, we give the strongest known relativized evidence that $\mathsf{SZK}$ contains hard problems, by exhibiting an oracle relative to which $\mathsf{SZK}$ (indeed, even $\mathsf{NISZK}$) is not contained in the class $\mathsf{UPP}$, containing those problems solvable by randomized algorithms with unbounded error. This answers an open question of Watrous from 2002 [Aar]. Second, we "lift" this oracle separation to the setting of communication complexity, thereby answering a question of Goos et al. (ICALP 2016). Third, we give relativized evidence that perfect zero knowledge proofs (captured by the class $\mathsf{PZK}$) are weaker than general zero knowledge proofs. Specifically, we exhibit oracles relative to which $\mathsf{SZK}$ does not belong to $\mathsf{PZK}$, $\mathsf{NISZK}$ does not belong to $\mathsf{NIPZK}$, and $\mathsf{PZK}$ is not equal to $\mathsf{coPZK}$. The first of these results answers a question raised in 1991 by Aiello and Hastad (Information and Computation), and the second answers a question of Lovett and Zhang (2016). We also describe additional applications of these results outside of structural complexity.

The technical core of our results is a stronger hardness amplification theorem for approximate degree, which roughly says that composing the gapped-majority function with any function of high approximate degree yields a function with high threshold degree.

Joint work with Adam Bouland, Dhiraj Holden, Justin Thaler, and Prashant Nalini Vasudevan.
• Wednesday, October 25, 2017: Pritish Kamath (MIT). [Time: 4.30-5.30pm]
Topic. Non-Interactive Agreement & Dimension Reduction for Polynomials
Abstract. The "Non-interactive Agreement Distillation" problem, specified by a joint distribution $P(x,y)$ and a target alphabet size $k$, is defined as follows: Two players, Alice and Bob, observe sequences $(X_1, ... , X_n)$ and $(Y_1, ... , Y_n)$ respectively where $\{(X_i, Y_i)\}$ are drawn i.i.d. from $P(x,y)$. Both players look at their share of randomness, and output an element of $[k]$. Their goal is to maximize the probability that their outputs are the same, while ensuring that their outputs are marginally uniform.

Given $P(x,y)$ and $k$, what is the largest correlation (or agreement probability) that the players can achieve? In the special case where $P(x,y)$ is the distribution of $\rho$-correlated Gaussians, the largest achievable correlation is also known as the "Optimal Noise-Stability of a $k$-partition" and has received wide interest in Computer Science. More generally, it is also related to the notions of common information studied in Information Theory.

It turns out that this value is not well understood, even in the special case of correlated Gaussians! Moreover, this value is a priori not even "computable", as there is no immediate upper bound on the number of samples the players need to draw in order to achieve the best possible correlation!

In this work, we obtain explicit (computable) bounds on the number of samples needed to get $\varepsilon$-close to the maximum achievable correlation (or even for the more general problem of "non-interactive simulation"). This implies computability of the maximum achievable correlation. [Our bounds significantly improve upon the results obtained recently by De, Mossel & Neeman, using fundamentally different techniques.]

At the heart of our result is a new technique that we call "Dimension Reduction for Polynomials". It is analogous to the Johnson-Lindenstrauss lemma, which is a dimension reduction for vectors. This technique is mostly elementary, and we believe it to be of independent interest.

This talk will discuss the motivational aspects of the problem, its moral and technical connections to other problems in computer science and will mainly focus on the new technique of "dimension reduction for polynomials".

• Wednesday, November 1, 2017: Andrej Risteski (MIT).
Topic. Beyond Log-concavity: Provable Guarantees for Sampling Multi-modal Distributions using Simulated Tempering Langevin Monte Carlo
Abstract. A key task in Bayesian statistics is sampling from distributions that are only specified up to a partition function (i.e., constant of proportionality). However, without any assumptions, sampling (even approximately) can be $\#\mathsf{P}$-hard, and few works have provided "beyond worst-case" guarantees for such settings.

For log-concave distributions, classical results going back to Bakry and Emery (1985) show that natural continuous-time Markov chains called Langevin diffusions mix in polynomial time. The most salient feature of log-concavity violated in practice is uni-modality: commonly, the distributions we wish to sample from are multi-modal. In the presence of multiple deep and well-separated modes, Langevin diffusion suffers from torpid mixing.

We address this problem by combining Langevin diffusion with simulated tempering. The result is a Markov chain that mixes more rapidly by transitioning between different temperatures of the distribution. We analyze this Markov chain for the canonical multi-modal distribution: a mixture of gaussians (of equal variance). The algorithm based on our Markov chain provably samples from distributions that are close to mixtures of gaussians, given access to the gradient of the log-pdf. For the analysis, we use a spectral decomposition theorem for graphs (Gharan and Trevisan, 2014) and a Markov chain decomposition technique (Madras and Randall, 2002).

Joint work with Rong Ge and Holden Lee.
• Thursday, November 2, 2017: Li-Yang Tan (TTI Chicago). [Room: 32-G882]
Topic. Fooling intersections of low-weight halfspaces
Abstract. A weight-$t$ halfspace is a Boolean function $f(x)=\mathrm{sign}(w_1 x_1 + \cdots + w_n x_n - \theta)$ where each $w_i$ is an integer in $\{-t,\dots,t\}$. We give an explicit pseudorandom generator that $\delta$-fools any intersection of $k$ weight-$t$ halfspaces with seed length $\mathrm{poly}(\log n, \log k,t,1/\delta)$. In particular, our result gives an explicit PRG that fools any intersection of any $\mathrm{quasipoly}(n)$ number of halfspaces of any $\mathrm{polylog}(n)$ weight to any $1/\mathrm{polylog}(n)$ accuracy using seed length $\mathrm{polylog}(n)$. Prior to this work no explicit PRG with non-trivial seed length was known even for fooling intersections of $n$ weight-$1$ halfspaces to constant accuracy.

The analysis of our PRG fuses techniques from two different lines of work on unconditional pseudorandomness for different kinds of Boolean functions. We extend the approach of Harsha, Klivans, and Meka for fooling intersections of regular halfspaces, and combine this approach with results of Bazzi and Razborov on bounded independence fooling CNF formulas. Our analysis introduces new coupling-based ingredients into the standard Lindeberg method for establishing quantitative central limit theorems and associated pseudorandomness results.

Joint work with Rocco Servedio.
• Thursday, November 16, 2017: Jiantao Jiao (Stanford).
Topic. Instance-optimal learning of the total variation distance
Abstract. The total variation distance (statistical distance) plays fundamental roles in statistics, machine learning, and theoretical computer science. We consider the problem of learning the total variation distance between $p$ and $q$ for any fixed $q$ given independent identically distributed samples from $p$. We characterize the optimal sample complexity in terms of $q$ and $\varepsilon$, and construct a near-linear time algorithm that achieves the optimal sample complexity for each $q$. In other words, the learner we constructed is instance optimal, paralleling the work of [VV'14] on instance optimal identity testing. The dependence on $q$ in learning the total variation distance is drastically different from that in testing identity. We show that for any fixed $q$, the performance of the optimal learner with $n$ samples is essentially that of the plug-in approach with $n \log n$ samples, where the plug-in approach evaluates the total variation distance between the empirical distribution of $p$ and the known distribution $q$.

The key methodology behind our achievability and lower bound constructions is the idea of local approximation, which bridges approximation theory and concentration inequalities. In particular, the upper and lower bounds are dual to each other, and proofs of upper bounds can be translated into proofs of lower bounds.

We generalize the local approximation approach to learning the total variation distance when both $p$ and $q$ are unknown. We obtain tight upper and lower bounds of the optimal sample complexity including the dependence on $\epsilon$, where we show it is necessary to utilize multivariate polynomial approximation: any univariate polynomial approximation scheme fails to achieves the optimal sample complexity. We show that the sample complexity for the both $p,q$ unknown case is essentially the same as the $q$ known case with $q$ being uniform, showing that $q$ being uniform is the most difficult case.

If time permits, we discuss the potential applications of the general local approximation methodology, which has proved to produce the optimal learners in a variety of settings such as learning the entropy, power sum functional, KL divergence, Hellinger divergence, $\chi^2$ divergence, support size, etc.

Based on joint work with Yanjun Han and Tsachy Weissman. [arXiv:1705.00807]
• Wednesday, November 29, 2017: Slobodan Mitrovic (EPFL).
Topic. Matchings in MPC frameworks
Abstract. The last decade has witnessed the success of a number of massive parallel computation (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. These frameworks allow for much more local computation, compared to the classical PRAM models. In this context, we investigate the complexity of one of the most fundamental graph problems: Maximum Matching. We show that if the memory per machine is $\Omega(n)$ (or even only $n/(\log n)^{O(\log \log n)}$), then for any fixed constant $\varepsilon > 0$, one can compute a $(2+\varepsilon)$-approximation to Maximum Matching in $O\left((\log \log n)^2\right)$ MPC rounds. This constitutes an exponential improvement over previous work -- both the one designed specifically for MPC and the one obtained via connections to PRAM or distributed algorithms -- which required $\Theta(\log n)$ rounds to achieve similar approximation guarantees.

To establish our result we need to deviate from the previous work in two important ways. Firstly, we use vertex--based graph partitioning, instead of the edge--based approaches that were utilized so far. Secondly, we develop a technique of round compression. This technique enables one to take a (distributed) algorithm that computes an $O(1)$-approximation of maximum matching in $O(\log n)$ independent PRAM phases and implement a super-constant many of these phases in only a constant number of actual MPC rounds. These techniques turn out to be what one needs to truly take advantage of the MPC model, as compared to the PRAM model, even in the regime of (sub-)linear memory per machine.

Based on joint work with Artur Czumaj, Jakub Łącki, Aleksander Mądry, Krzysztof Onak and Piotr Sankowski. [arXiv:1707.03478]
• Thursday, November 30, 2017: Jerry Li (MIT). [Room: 32-G449 (Kiva/Patil)]
Topic. Mixture Models, Robustness, and Sum of Squares Proofs
Abstract. We use the Sum of Squares (SoS) method to develop a new efficient algorithm for clustering and mean estimation in well-separated high-dimensional mixture models, substantially improving upon the statistical guarantees achieved by previous efficient algorithms. In particular, we study mixtures of $k$ distributions, where every pair of distributions has means separated by separated by at least $k^\varepsilon$ for any $\varepsilon > 0$. In the special case of spherical Gaussian mixtures, we give a $k^{O(1/\varepsilon^2)}$-time algorithm that learns the means of the components in the mixture and accurately clusters samples from the mixture. This is the first algorithm to improve on greedy ("single-linkage") and spectral clustering, breaking a long-standing barrier for efficient algorithms at separation $k^{1/4}$.

Our techniques are based on adapting algorithmic ideas from robust statistics, and are potentially of independent interest. Our main algorithm for learning mixture models provides an entirely SoS interpretation of the convex programming framework of [Diakonikolas et al, FOCS 16]. We show that many of the proofs from that paper can be replaced with much simpler proofs using only basic concentration and Holder's inequality, which allows us to approach this problem via SoS. As a corollary of this, we also obtain improved rates for robust mean estimation in certain regimes.

Joint work with Sam Hopkins (Cornell).

## Theory Calendar

(includes A&C Seminars, TOC Colloquium, CIS Seminars, Theory Lunch, TOC tea and more!) :

Click on the events above to get more description about the same (title/abstract for talks, etc.)