# Algorithms & Complexity Seminar, MIT : 2016-17

Organizers: Akshay Degwekar (), Pritish Kamath ()

The Algorithms & Complexity Seminar for the 2016-2017 year will usually (unless otherwise stated) meet on Wednesdays 4pm-5pm in 32-G575 (Theory Lab on the 5th floor of the Stata Center). The style and format of these meetings are variable. Please feel free to contact the organizers to find out more details. To receive announcements of upcoming talks, subscribe to the mailing list by either visiting the mailman page or send an empty email to compalgsem-subscribe@lists.csail.mit.edu.

## Upcoming Talks

• Wednesday, July 5, 2017: Harry Lang (Johns Hopkins U.).
Topic. Coresets for $k$-Means-Type Problems on Streams
Abstract. Let $f$ be a non-negative symmetric dissimilarity measure. Given sets of points $P$ (the input data) and $C$ (a "query"), we define $F(P,C) = \sum_{p \in P} \min_{c \in C} f(p,c)$. An example that fits into this framework is $k$-means clustering where $f = \text{distance}^2$ and we restrict $|C| = k$.

An $\varepsilon$-coreset for $P$ is a set $Q$ such that $F(Q,C)$ is within a factor of $(1 + \varepsilon)$ of $F(P,C)$ for any query $C$. For data reduction, the goal is to construct a coreset $Q$ of minimal cardinality. We introduce improved coreset constructions for a wide class of functions that obey a weak form of the triangle inequality: this includes $k$-means, $k$-median, Tukey's bi-weight function, and the Huber loss function.

In the streaming model, the input $P$ (a set of $n$ points) arrives sequentially and algorithms attempt to maintain a coreset using a minimal amount of memory. We introduce a new technique for maintaining coresets in the streaming model with $O(1)$ overhead. For example, the previous state-of-the-art for maintaining an $\varepsilon$-coreset for metric $k$-means on a stream required the storage of $O(\varepsilon^{-4} k \log k \log^6 n)$ points whereas our method requires $O(\varepsilon^{-2} k \log k \log n)$ points.

Joint work with Vladimir Braverman and Dan Feldman.

## Past Talks

• Wednesday, May 25, 2016: Tselil Schramm (UC Berkeley).
Topic. Strongly refuting random CSPs below the spectral threshold
Abstract. Random instances of $3$-SAT are known to be unsatisfiable with high probability when there at least $5N$ clauses. However, given a random $3$-SAT instance on $N$ variables, the task of refuting the instance, or of proving that the instance has no satisfying assignments, is hypothesized to be computationally hard if there are $O(N)$ clauses--in fact, the best known efficient algorithms for refutation require instances with at least $\Omega(N^{3/2})$ clauses, a factor of $N^{1/2}$ beyond the unsatisfiability threshold at $\Theta(N)$.

In this talk, I will describe a new spectral algorithm that refutes $3$-SAT instances with fewer clauses, given more time. Specifically, our algorithm refutes instances with $\Theta(N^{3/2 - \delta/2})$ clauses in $\mathrm{exp}(O(N^\delta))$ time, giving the same guarantees as the best known polynomial-time algorithms when $\delta = 0$, and from there smoothly approaching the unsatisfiability threshold at $\delta = 1$. Further, our algorithm strongly refutes the instances, certifying that no assignment satisfies more than a $(1-\varepsilon)$-fraction of constraints for a constant $\varepsilon$.

Our algorithm also implies tight upper bounds on the value of sum-of-squares relaxations for random CSPs, as well as the value of sum-of-squares relaxations for random polynomial maximization over the unit sphere (injective tensor norm).

Based on joint work with Prasad Raghavendra and Satish Rao.
• Wednesday, June 1, 2016: Jerry Li (MIT).
Topic. Robust Estimators in High Dimensions without the Computational Intractability
Abstract. We study high-dimensional distribution learning in an agnostic setting where an adversary is allowed to arbitrarily corrupt an $\varepsilon$-fraction of the samples. Such questions have a rich history spanning statistics, machine learning and theoretical computer science. Even in the most basic settings, the only known approaches are either computationally inefficient or lose dimension-dependent factors in their error guarantees. This raises the following question: is high-dimensional agnostic distribution learning even possible, algorithmically?

In this talk, we present the first computationally efficient algorithms with dimension-independent error guarantees for agnostically learning several fundamental classes of high-dimensional distributions: (1) a single Gaussian, (2) a product distribution on the hypercube, (3) mixtures of two product distributions (under a natural balancedness condition), and (4) mixtures of spherical Gaussians. Our algorithms achieve error that is independent of the dimension, and in many cases scales nearly optimally with the fraction of adversarially corrupted samples. Moreover, we develop a general recipe for detecting and correcting corruptions in high-dimensions, that may be applicable to many other problems.

Based on joint work with Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Ankur Moitra, and Alistair Stewart.
• Wednesday, June 8, 2016: Andrea Lincoln (Stanford).
Topic. Deterministic Time-Space Tradeoffs for $k$-SUM
Abstract. Given a set of numbers, the $k$-SUM problem asks for a subset of $k$ numbers that sums to zero. When the numbers are integers, the time and space complexity of $k$-SUM is generally studied in the word-RAM model; when the numbers are reals, the complexity is studied in the real-RAM model, and space is measured by the number of reals held in memory at any point.

We present a time and space efficient deterministic self-reduction for the $k$-SUM problem which holds for both models, and has many interesting consequences. To illustrate:

• $3$-SUM is in deterministic time $O(n^2 \lg\lg(n)/\lg(n))$ and space $O(\sqrt{n \lg(n)/\lg\lg(n)})$, extending the results of Gold and Sharir.
• $3$-SUM is in deterministic time $O(n^2)$ and space $O(\sqrt n)$, derandomizing an algorithm of Wang.
• A popular conjecture states that $3$-SUM requires $n^{2-o(1)}$ time on the word-RAM. We show that the $3$-SUM Conjecture is in fact equivalent to the (seemingly weaker) conjecture that every $O(n^{.51})$-space algorithm for $3$-SUM requires at least $n^{2-o(1)}$ time on the word-RAM.
Joint work with Virginia Vassilevska Williams, Josh Wang and Ryan Williams.
• Wednesday, June 15, 2016: Maryam Aliakbarpour (MIT).
Topic. Learning and Testing Junta Distributions
Abstract. We consider the problem of learning distributions in the presence of irrelevant features. This problem is formalized by introducing a new notion of $k$-junta distributions. Informally, a distribution $\mathcal{D}$ over the domain $X^n$ is a $k$-junta distribution with respect to another distribution $\mathcal{U}$ over the same domain if there is a set $J \subseteq [n]$ of size $|J| \le k$ that captures the difference between $\mathcal{D}$ and $\mathcal{U}$.

We show that it is possible to learn $k$-junta distributions with respect to the uniform distribution over the Boolean hypercube $\{0,1\}^n$ in time $\mathrm{poly}(n^k, 1/\varepsilon)$. This result is obtained via a new Fourier-based learning algorithm inspired by the Low-Degree Algorithm of Linial, Mansour, and Nisan (1993).

We also consider the problem of testing whether an unknown distribution is a $k$-junta distribution with respect to the uniform distribution. We give a nearly-optimal algorithm for this task. Both the analysis of the algorithm and the lower bound showing its optimality are obtained by establishing connections between the problem of testing junta distributions and testing uniformity of weighted collections of distributions.

This is a joint work with Eric Blais and Ronitt Rubinfeld.
• Friday, June 17, 2016: Ramprasad Saptharishi (Tel-Aviv University).
Topic. Depth Reduction for Arithmetic Circuits
Abstract. In the recent years, there has been a lot of activity in arithmetic circuit lower bounds. A key first step in all these results is 'depth reduction' which is generically the task of simulating an arithmetic circuit by a very shallow (often like depth 4 or 3 even) circuit of not-too-much-larger size. This effectively has reduced the task of proving lower bounds for general circuits to lower bounds for very shallow (and structured) circuits.

This talk shall be largely a survey of the work from the last few years, and also some concrete easy-to-state open problems. Following that, we'll review the classical depth reduction results, and also reduction to depth four via a slightly different proof. We shall then abstract the main ingredient in the proof and apply it to a subclass of formulas to obtain additional structure. Hopefully, this additional structure would be useful in coming up with techniques to prove formula lower bounds.
• Wednesday, July 6, 2016: Christopher Musco (MIT).
Topic. Iterative Sampling Methods for Low-Rank Matrix and Kernel Approximation
Abstract. Low-rank matrix approximation and its close relative, principal component analysis, are ubiquitous in statistics and machine learning. To address computational concerns as data size increases, recent years have a seen a flurry of work on fast approximation algorithms for these problems. I'll present a line of research that introduces a new approach for low-rank approximation which gives strong provable guarantees for any data matrix. Our methods iteratively select a subsample of representative rows or columns from a matrix which are then used to compute approximate principal components.

Our methods match the runtimes guaranteed by the fastest known "random projection" methods for low-rank approximation. However, they can be much faster for sparse or structured data and can be applied to data problems where matrices are not represented explicitly. Specifically, combined with the Nystrom method, our sampling scheme gives the first algorithms for provably approximating kernel matrices in time linear in the number of data points. This leads to the fastest known methods for kernel ridge regression, kernel principal component analysis, kernel $k$-means clustering, and kernel canonical correlation analysis.

This talk is based on work with Michael B. Cohen and Cameron Musco that appears in preprints arXiv 1511.07263 and arXiv 1605.07583.
• Wednesday, July 20, 2016: Ali Vakilian (MIT).
Topic. Streaming Algorithms for Set Cover Problem
Abstract. We consider the classic Set Cover problem in the data stream model. For $n$ elements and $m$ sets we give a $O(1/\delta)$-pass algorithm with a strongly sub-linear $\widetilde{O}(mn^{\delta})$ space and logarithmic approximation factor. We complement this result by showing that the tradeoff between the number of passes and space exhibited by our algorithm is tight, at least when the approximation factor is equal to $1$. Specifically, we show that any algorithm that computes set cover exactly using $({1 \over 2\delta}-1)$ passes must use $\widetilde{\Omega}(m n^{\delta})$ space. Furthermore, we consider the problem in the geometric setting where the elements are points in $\mathbb{R}^2$ and sets are either discs, axis-parallel rectangles, or fat triangles in the plane, and show that our algorithm (with a slight modification) uses the optimal $\widetilde{O}(n)$ space to find a logarithmic approximation in $O(1/\delta)$ passes.

Finally, we show that any randomized one-pass algorithm that distinguishes between covers of size $2$ and $3$ must use a linear (i.e., $\Omega(mn)$) amount of space. This is the first result showing that a randomized, approximate algorithm cannot achieve a sub-linear space bound. This indicates that using multiple passes might be necessary in order to achieve sub-linear space bounds for this problem while guaranteeing small approximation factors.

This is based on joint work with Erik Demaine, Sariel Har-Peled, Piotr Indyk and Sepideh Mahabadi.
• Thursday, August 4, 2016: Ankit Garg (Microsoft Research New England).
Topic. On algorithmic aspects of Brascamp-Lieb inequalities
Abstract. The celebrated Brascamp-Lieb (BL) inequalities (and their extensions) are an important mathematical tool, unifying and generalizing numerous inequalities in analysis, convex geometry and information theory. While their structural theory is very well understood, far less is known about computing their main parameters. We give polynomial time algorithms to compute:

• Feasibility of BL-datum
• The optimal BL-constant
• A weak separation oracle for the BL-polytope
This will be based on joint work with Leonid Gurvits, Rafael Oliveira and Avi Wigderson.

The algorithms are obtained by a simple efficient reduction of a given BL-datum to an instance of the Operator Scaling problem defined by Gurvits, for which the present authors provided a polynomial time algorithm recently.
• Tuesday, August 23, 2016: Jonathan Mosheiff (Hebrew University).
Topic. On the Rigidity of Sparse Random Graphs
Abstract. It is a truth universally acknowledged, that random objects are asymmetric. It was shown by Wright (1971) that for $p$ slightly larger than $\log n / n$ a random $G(n,p)$ graph has, whp, a trivial automorphism group. This bound is tight, since a graph of slightly smaller density is likely to have isolated vertices. Our work concerns sparser graphs, where the average degree still goes to infinity but $p$ is below Wright's threshold. We show that in this range, whp all of $G$'s automorphisms are essentially trivial. More concretely, it holds almost surely that $G$'s $2$-core (i.e., the maximal subgraph of $G$ with minimal degree at least $2$) has only the trivial automorphism. Our theorem yields some interesting insights on the graph reconstruction conjecture, as well as a new algorithm for random graph canonical labeling.

This is a joint work with Nati Linial.
• Wednesday, August 24, 2016: Brendan Juba (Washington Univ. in St. Louis).
Topic. Conditional Sparse Linear Regression
Abstract. We consider the problem of jointly identifying a significant (but perhaps small) segment of a population in which there is a highly sparse linear regression fit, together with the coefficients for the linear fit. This is intended to serve as a principled alternative to the practice of clustering the data under a variety of methods to see if any yield clusters that can be modeled well. We give algorithms for such problems when this unknown segment of the population ("cluster") is described by a $k$-DNF condition and the regression fit is $s$-sparse for constant $k$ and $s$. We note evidence that $k$-DNFs are essentially the most expressive natural representation for which this problem is tractable.

Based on joint work with Ben Moseley (WUSTL).
• Wednesday, September 21, 2016: Ofer Grossman (MIT).
Topic. Bipartite Perfect matching in Pseudo-deterministic $NC$
Abstract. Pseudo-deterministic algorithms are randomized search algorithms that on different executions on the same input, output the same solution with high probability.

We will discuss how pseudo-deterministic algorithms bridge the gap between randomized search and decision problems for problems in $P$ and in $NC$. Next, we will show a pseudo-deterministic $NC$ algorithm for bipartite matching. Finally, we will show how pseudo-determinism can be used to save on random bits used by classical randomized algorithms.

Joint work with Shafi Goldwasser.
• Wednesday, September 28, 2016: Rasmus Kyng (Yale).
Topic. Approximate Gaussian Elimination for Laplacians
Abstract. We show how to perform sparse approximate Gaussian elimination for Laplacian matrices. We present a simple, nearly linear time algorithm that approximates a Laplacian by a matrix with a sparse Cholesky factorization – the version of Gaussian elimination for positive semi-definite matrices. We compute this factorization by subsampling standard Gaussian elimination. This is the first nearly linear time solver for Laplacian systems that is based purely on random sampling, and does not use any graph theoretic constructions such as low-stretch trees, sparsifiers, or expanders. The crux of our proof is the use of matrix martingales to analyze the algorithm.

Joint work with Sushant Sachdeva.
• Wednesday, October 12, 2016: Ilias Diakonikolas (USC).
Topic. A New Approach for Distribution Testing
Abstract. We study problems in distribution property testing: Given sample access to one or more unknown discrete distributions, we want to determine whether they have some global property or are $\varepsilon$-far from having the property in $\ell_1$ distance. We provide a simple and general approach to obtain upper bounds in this setting, by reducing $\ell_1$-testing to $\ell_2$-testing. Our reduction yields optimal $\ell_1$-testers, by using a standard $\ell_2$-tester as a black-box.

Using our framework, we obtain optimal estimators for a wide variety of $\ell_1$ distribution testing problems, including the following: identity testing to a fixed distribution, closeness testing between two unknown distributions (with equal/unequal sample sizes), independence testing (in any number of dimensions), closeness testing for collections of distributions, and testing $k$-flatness. For several of these problems, we give the first optimal tester in the literature. Moreover, our estimators are significantly simpler to state and analyze compared to previous approaches.

As our second main contribution, we provide a direct general approach for proving distribution testing lower bounds, by bounding the mutual information. Our lower bound approach is not restricted to symmetric properties, and we use it to prove tight lower bounds for all the aforementioned problems.

Joint work with Daniel Kane.
• Wednesday, October 19, 2016: Alex Wein (MIT). (Room: 32-D507)
Topic. Optimality and sub-optimality of PCA for spiked random matrix models
Abstract. Classical results from random matrix theory state that if a random (Wigner or Wishart) matrix is perturbed by a planted rank-$1$ signal (or "spike"), this spike affects the top eigenvalue if and only if the size of the spike exceeds a particular threshold. In particular, above this threshold the top eigenvalue is a reliable test to distinguish the spiked and un-spiked models. We consider the question of whether this eigenvalue test is optimal, or whether there is some other statistical test that can succeed below the eigenvalue threshold.

We answer these types of questions using a simple and widely-applicable technique associated with the statistical notion of "contiguity." Namely, by computing a particular second moment, one can show that two models are contiguous and therefore cannot be reliably distinguished.

Our results include:

1. For the Gaussian Wigner ensemble, we show that PCA achieves the optimal detection threshold for a variety of natural priors for the spike.
2. For any non-Gaussian Wigner ensemble, we show that PCA is always sub-optimal for detection (contrary to the notion of universality from random matrix theory). However, a variant of PCA achieves the optimal threshold (for natural priors) by pre-transforming the matrix entrywise.
3. For the Wishart ensemble we show that in some regimes, inefficient algorithms can succeed below the PCA threshold, whereas no known efficient algorithm achieves this.
Based on joint work with Amelia Perry, Afonso Bandeira, and Ankur Moitra. Available at arXiv:1609.05573.
• Wednesday, October 26, 2016: Rati Gelashvili (MIT).
Topic. Time-Space Trade-Offs in Molecular Computation
Abstract. Population protocols are a popular model of distributed computing, in which randomly-interacting agents with little computational power cooperate to jointly perform computational tasks. Inspired by developments in molecular computation, and in particular DNA computing, recent algorithmic work has focused on the complexity of solving simple yet fundamental tasks in the population model, such as leader election (which requires convergence to a single agent in a special "leader" state), and majority (in which agents must converge to a decision as to which of two possible initial states had higher initial count). Known results point towards an inherent trade-off between the time complexity of such algorithms, and the space complexity, i.e. size of the memory (number of states) available to each agent.

I will overview different results with a spectrum of time-space guarantees for these tasks, culminating in algorithms showing that fast, poly-logarithmic convergence time can be achieved using $O(\log^2 n)$ space per node in both cases. I will also present a unified lower bound, which relates the space available per node with the time complexity achievable by a protocol: for instance, any protocol solving either of these tasks for $n$ agents using $O(\log \log n)$ states must take $\Omega(n / \mathrm{poly}\log n)$ expected time. This is the first result to characterize time complexity for protocols which employ super-constant number of states per node, and proves that fast, poly-logarithmic running times require protocols to have relatively large space costs. Overall, these results highlight a time complexity separation between $O(\log \log n)$ and $\Theta(\log^2 n)$ state space size for both majority and leader election, and introduce new techniques, which should be applicable more broadly.

This is based on joint works with Dan Alistarh, James Aspnes, David Eisenstat, Ronald L. Rivest and Milan Vojnovic.
• Wednesday, November 9, 2016: Huy L. Nguyen (NEU).
Topic. Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality
Abstract. We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions. In the distributed sparse Gaussian mean estimation problem, each of the $m$ machines receives $n$ data points from a $d$-dimensional Gaussian distribution with unknown mean $\theta$ which is promised to be $k$-sparse. The machines communicate by message passing and aim to estimate the mean $\theta$. We provide a tight (up to logarithmic factors) tradeoff between the estimation error and the number of bits communicated between the machines. This directly leads to a lower bound for the distributed sparse linear regression problem: to achieve the statistical minimax error, the total communication is at least $\Omega(\min\{n, d\}m)$, where $n$ is the number of observations that each machine receives and $d$ is the ambient dimension. We also give the first optimal simultaneous protocol in the dense case for mean estimation. As our main technique, we prove a distributed data processing inequality, as a generalization of usual data processing inequalities, which might be of independent interest and useful for other problems.

This is joint work with Mark Braverman, Ankit Garg, Tengyu Ma, and David Woodruff.
• Wednesday, November 30, 2016: Tengyu Ma (Princeton).
Topic. Analyzing non-convex optimization: matrix completion and linear residual networks
Abstract. Non-convex optimization is the main algorithmic technique behind many state-of-art results in machine learning. It has been conjectured that the landscape of many training objectives has the nice geometric property that "all local minima are global minima" and thus admits efficient optimization algorithms.

In the first part of the talk, I will show that the optimization landscape of matrix completion — a famous problem in ML with wide applications in recommender system and collaborative filtering — does have this property. This implies that (stochastic) gradient descent from arbitrary initialization can solve matrix completion in polynomial time.

Next, we will discuss linear residual networks, as a simplified model towards the first-cut understanding of residual networks. We will give a strikingly simple proof that arbitrarily deep linear residual networks have no spurious critical points (= points with vanishing gradient that are not global minima). In contrast, the landscape of standard linear neural networks does have spurious critical points. This demonstrates that re-parameterization using the identity shortcut connection can make the optimization easier.

Based on joint works arXiv:1605.07272 with Rong Ge and Jason D. Lee, and arXiv:1611.04231 with Moritz Hardt.
• Wednesday, December 7, 2016: Dhiraj Holden (MIT).
Topic. Solving problems in $P$ given correlated instances
Abstract. Instances of computational problems do not arise in isolation. Often, correlations between the solutions of related instances arise in nature. We study the impact of having access to correlated instances on how fast we can solve a variety of problems. In particular, we look at the longest common subsequence, edit distance, and dynamic time warping distance problems, all of which cannot be solved in time $n^{2 - \varepsilon}$ assuming the Strong Exponential Time Hypothesis. We give results for two correlation models, a model where the solution corresponds exactly to structure in the correlated instances, and one where each part of the solution is used in the correlated instance with high probability. In both cases we are able to achieve strongly subquadratic time algorithms, and when we have an exact correspondence in the solutions we can solve all of these problems in near-linear time. We view our work as pointing out a new avenue for looking for significant improvements for sequence alignment problems and computing similarity measures, by taking advantage of access to sequences which are correlated through natural generating processes. In this first work we show how to take advantage of mathematically inspired simple clean models of correlation -- the intriguing question, looking forward, is to find correlation models which coincide with evolutionary models and other relationships and for which our approach to multiple sequence alignment gives provable guarantees.

Joint work with Shafi Goldwasser. Paper available at ECCC TR16-056.
• Thursday, December 8, 2016: Morteza Monemizadeh (Rutgers). (Room: 32-G882)
Topic. Testable Bounded Degree Graph Properties Are Random Order Streamable
Abstract. We study graph streaming problems. We transform known property testing algorithms into streaming ones. In particular, our main result is that for bounded degree graphs, any property that is constant-query testable in the adjacent list model can be solved with constant space in single-pass random order model. Our results are obtained by estimating the distribution of what are known as $k$-discs used in property testing methods, directly on a random order graph stream with constant space.

We also apply this approach to constant time approximation algorithms in the adjacency list model and get single-pass random order streaming algorithms for all of them. For example, we get constant-space single-pass random order streaming algorithms for $(1,\varepsilon n)$-approximating maximal matching to additive $\varepsilon n$ error.

Our results are among the first known truly sublinear space graph streaming algorithms, while $\Omega(n)$ space is needed for many graph streaming problems ($n$ is the number of nodes) and previous results typically use the semi-streaming model of $O(n \cdot \mathrm{polylog}(n))$ space.

Joint work with S. Muthukrishnan, Pan Peng, Christian Sohler.
• Wednesday, March 1, 2017: Nika Haghtalab (CMU).
Topic. Opting Into Optimal Matchings
Abstract. Kidney transplant is sometimes the best treatment for people who suffer from chronic kidney disease. However, due to medical incompatibility issues, a direct kidney transplant is not always possible even for those patients that are fortunate enough to have a willing donor. This is where Kidney Exchange comes in: It enables two or more donor-patient pairs to swap donors, so that each patient receives a compatible kidney. In recent years, hospitals have enrolled patients into regional or even national kidney exchange programs. However, hospitals may choose not to participate if they can match more of their own patients on their own -- economists would say that the exchange is not individually rational for hospitals. This behavior significantly reduces the social welfare -- the total number of patients that receive a kidney transplant.

In this work, we revisit the problem of designing optimal, individually rational kidney exchange mechanisms. We offer a new perspective on this problem by showing that under mild probabilistic assumptions, any optimal kidney exchange is likely to be individually rational up to lower-order terms. We also show that a simple and practical mechanism is (fully) individually rational, and likely to be optimal up to lower-order terms.

This talk is based on joint work with Avrim Blum, Ioannis Caragiannis, Ariel Procaccia, Eviatar Procaccia, and Rohit Vaish (SODA'17).
• Thursday, March 2, 2017: Yuval Filmus (Technion).
Topic. Twenty (simple) questions
Abstract. The game of 20 questions is a search-theoretic interpretation of entropy. Alice chooses a number $x \in \{1,\ldots,n\}$ according to a known distribution $\mu$, and Bob identifies $x$ using Yes/No questions. Bob's goal is to minimize the expected number of questions until $x$ is identified, and his optimal strategy uses about $H(\mu)$ questions on average. However, it might invoke arbitrary Yes/No questions.

Are there restricted sets of questions that match the optimal strategy, either exactly or approximately?

We explore this question from several different angles, uncovering a connection to binary comparison search trees, a variant of binary search trees.

Joint work with Yuval Dagan, Ariel Gabizon and Shay Moran, to appear in STOC 2017.
• Wednesday, March 15, 2017: Dean Doron (Tel-Aviv University).
Topic. Explicit two-source extractors for near-logarithmic min-entropy
Abstract. In this talk, we show an explicit construction of extractors for two independent sources of near-logarithmic min-entropy. Previous constructions required either $\mathrm{polylog}(n)$ min-entropy or more than two sources.

The result extends the breakthrough result of Chattopadhyay and Zuckerman and also uses non-malleable extractors. The main new ingredient is a somewhere-random condenser with a small entropy gap, used as a sampler.

Our construction can be seen as an efficient reduction to constructing non-malleable extractors, so using very recent constructions of non-malleable extractors (by Cohen and Li) - we will see how to obtain explicit two-source extractor for $O(\log n \cdot \log\log n)$ min-entropy and constant error.
• Wednesday, April 5, 2017: Clément Canonne (Columbia University).
Abstract. Adaptivity is known to play a crucial role in property testing. In particular, there exist properties for which there is an exponential gap between the power of adaptive testing algorithms, wherein each query may be determined by the answers received to prior queries, and their non-adaptive counterparts, in which all queries are independent of answers obtained from previous queries.

In this work, we investigate the role of adaptivity in property testing at a finer level. We first quantify the degree of adaptivity of a testing algorithm by considering the number of "rounds of adaptivity" it uses. More accurately, we say that a tester is $k$-(round) adaptive if it makes queries in $k+1$ rounds, where the queries in the $i$'th round may depend on the answers obtained in the previous $i-1$ rounds. Then, we ask the following question:

Does the power of testing algorithms smoothly grow with the number of rounds of adaptivity?

We provide a positive answer to the foregoing question by proving an adaptivity hierarchy theorem for property testing. Specifically, our main result shows that for every integer $n$ and $0 \leq k \leq n^{0.33}$ there exists a property $P_{n,k}$ of functions for which (1) there exists a $k$-adaptive tester for $P_{n,k}$ with query complexity $\tilde{O}(k)$, yet (2) any $(k-1)$-adaptive tester for $P_{n,k}$ must make $\tilde{\Omega}(n/k^2)$ queries. In addition, we show that such a qualitative adaptivity hierarchy can be witnessed for testing natural properties of graphs.

Joint work with Tom Gur (Weizmann Institute).
• Thursday, April 6, 2017: Richard Baraniuk (Rice University). (3-4pm; Room: 32-D463 [Star])
Topic. A Probabilistic Theory of Deep Learning
Abstract. A grand challenge in machine learning is the development of computational algorithms that match or outperform humans in perceptual inference tasks that are complicated by nuisance variation. For instance, visual object recognition involves the unknown object position, orientation, and scale in object recognition while speech recognition involves the unknown voice pronunciation, pitch, and speed. Recently, a new breed of deep learning algorithms have emerged for high-nuisance inference tasks that routinely yield pattern recognition systems with near- or super-human capabilities. But a fundamental question remains: Why do they work? Intuitions abound, but a coherent framework for understanding, analyzing, and synthesizing deep learning architectures has remained elusive. We answer this question by developing a new probabilistic framework for deep learning based on the Deep Rendering Model: a generative probabilistic model that explicitly captures latent nuisance variation. By relaxing the generative model to a discriminative one, we can recover two of the current leading deep learning systems, deep convolutional neural networks and random decision forests, providing insights into their successes and shortcomings, a principled route to their improvement, and new avenues for exploration.

Speaker Bio. Richard G. Baraniuk is the Victor E. Cameron Professor of Electrical and Computer Engineering at Rice University. His research interests lie in new theory, algorithms, and hardware for sensing, signal processing, and machine learning. He is a Fellow of the IEEE, AAAS, and the National Academy of Inventors and has received the DOD Vannevar Bush Faculty Fellowship (NSSEFF), national young investigator awards from the US NSF and ONR, the Rosenbaum Fellowship from the Isaac Newton Institute of Cambridge University, the ECE Young Alumni Achievement Award from the University of Illinois, the Wavelet Pioneer and Compressive Sampling Pioneer Awards from SPIE, the IEEE Signal Processing Society Best Paper Award, and the IEEE Signal Processing Society Technical Achievement Award. His work on the Rice single-pixel compressive camera has been widely reported in the popular press and was selected by MIT Technology Review as a TR10 Top 10 Emerging Technology. For his teaching and education projects, including Connexions (cnx.org) and OpenStax (openstax.org), he has received the C. Holmes MacDonald National Outstanding Teaching Award from Eta Kappa Nu, the Tech Museum of Innovation Laureate Award, the Internet Pioneer Award from Harvard University, the World Technology Award for Education, the IEEE-SPS Education Award, the WISE Education Award, and the IEEE James H. Mulligan, Jr. Medal for Education.
• Wednesday, April 12, 2017: Benjamin Rossman (U. Toronto).
Topic. New separations of formula vs. circuit size
Abstract. I will present some recent results on the power of formulas vs. circuits in the bounded-depth setting. In joint work with Srikanth Srinivasan, we obtain a nearly optimal separation between the $\mathrm{AC}^0[\oplus]$ formula vs. circuit size of the Approximate Majority function. In other work, we show that the $\mathrm{AC}^0$ circuit (resp. formula) size of the $H$-Subgraph Isomorphism problem is tied to the treewidth (resp. treedepth) of the pattern graph $H$. The latter formula-size lower bound relies on a new "excluded-minor approximation" of treedepth (joint with Kenich Kawarabayashi). The former is based on an improved construction of low-degree approximating polynomials for $\mathrm{AC}^0[\oplus]$ formulas.
• Thursday, April 20, 2017: Inbal Cohen (Hebrew University of Jerusalem).
Topic. Approximate Modularity Revisited
Abstract. Set functions with convenient properties (such as submodularity) often arise in algorithmic game theory, and allow for improved properties of optimization algorithms and mechanisms. It is natural to ask (e.g., in the context of data driven applications) how robust such properties are, and whether small deviations from them can be tolerated. We consider two such questions in the important special case of linear set functions. One question that we address is whether any set function that approximately satisfies the modularity equation (linear functions satisfy the modularity equation exactly) is close to a linear function. The answer to this is positive (in a precise formal sense) as shown by Kalton and Roberts [1983] (and further improved by Bondarenko, Prymak, and Radchenko [2013]). We revisit their proof idea that is based on expander graphs, and provide significantly stronger upper bounds by combining it with new techniques. Furthermore, we provide improved lower bounds for this problem. Another question that we address is that of how to learn a linear function $h$ that is close to an approximately linear function $f$, while querying the value of $f$ on only a small number of sets.

Joint work with Uriel Feige and Michal Feldman.
• Wednesday, May 10, 2017: Govind Ramnarayan (MIT). (Room: 32-D463)
Topic. Relaxed Locally Correctable Codes
Abstract. Locally decodable codes (resp. locally correctable codes), or LDCs (resp. LCCs), are codes for which individual symbols of the message (resp. codeword) an be recovered by reading just a few bits from a noisy codeword. Such codes have been very useful in theory and practice, but suffer from a poor tradeoff between the rate of the code and the query complexity of its local decoder (resp. corrector).

A natural relaxation of locally decodable codes (LDCs) considered by Ben-Sasson et al. (SICOMP, 2006) allows the decoder to "reject" when it sees too many errors to decode the desired symbol of the message. They show that, when the decoder is given this liberty, there exist codes with rate that is subexponentially better than that of the best constant-query locally decodable codes known today.

We extend their relaxation to locally correctable codes, and achieve similar savings in complexity compared to existing LCCs. Specifically, our codes have:

1. Subexponentially lower rate than existing LCCs in the constant-query regime.
2. Nearly subexponentially lower query complexity than existing LCCs in the constant-rate regime.
Joint work with Tom Gur and Ron Rothblum.
• Thursday, May 11, 2017: Arnab Bhattacharyya (Indian Institute of Science). (Time: 2-3pm)
Topic. Hardness of learning noisy halfspaces using polynomial thresholds
Abstract. We prove the hardness of weakly learning halfspaces in the presence of adversarial noise using polynomial threshold functions (PTFs), assuming Khot's Unique Games Conjecture (UGC). In particular, we prove that for any constants $d \in \mathbb{N}$ and $\varepsilon > 0$, assuming the UGC, it is NP-hard to decide: given a set of $\{-1,1\}$-labeled points in $\mathbb{R}^n$ whether
(YES Case) there exists a halfspace that classifies $(1-\varepsilon)$-fraction of the points correctly, or
(NO Case) any degree-$d$ PTF classifies at most $(1/2 + \varepsilon)$-fraction of the points correctly.

This strengthens to all constant degrees the previous NP-hardness of learning using degree-$2$ PTFs shown by Diakonikolas et al. (2011). The latter result had remained the only progress over the works of Feldman et al. (2006) and Guruswami et al. (2006) ruling out weakly proper learning adversarially noisy halfspaces.

Joint work with Suprovat Ghoshal (IISc) and Rishi Saket (IBM).
• Wednesday, May 24, 2017: Josh Alman (MIT).
Topic. Cell-Probe Lower Bounds from Online Communication Complexity
Abstract. In this work, we introduce an online model for communication complexity. Analogous to how online algorithms receive their input piece-by-piece, our model presents one of the players Bob his input piece-by-piece, and has the players Alice and Bob cooperate to compute a result it presents Bob with the next piece. This model has a closer and more natural correspondence to dynamic data structures than the classic communication models do and hence presents a new perspective on data structures.

We first present a lower bound for the online set intersection problem in the online communication model, demonstrating a general approach for proving online communication lower bounds. The online communication model prevents a batching trick that classic communication complexity allows, and yields a stronger lower bound. Then we apply the online communication model to data structure lower bounds by studying the Group Range Problem, a dynamic data structure problem. This problem admits an $O(\log n)$-time worst-case data structure. Using online communication complexity, we prove a tight cell-probe lower bound: spending $o(\log n)$ (even amortized) time per operation results in at best an $\mathrm{exp}(−\delta^2 n)$ probability of correctly answering a $(1/2 + \delta)$-fraction of the $n$ queries.

Joint work with Joshua Wang and Huacheng Yu.
• Wednesday, May 31, 2017: Andrea Lincoln (MIT).
Topic. Conditional Hardness for Sensitivity Problems
Abstract. In recent years it has become popular to study dynamic problems in a sensitivity setting: Instead of allowing for an arbitrary sequence of updates, the sensitivity model only allows to apply batch updates of small size to the original input data. The sensitivity model is particularly appealing since recent strong conditional lower bounds ruled out fast algorithms for many dynamic problems, such as shortest paths, reachability, or subgraph connectivity.

In this paper we prove conditional lower bounds for these and additional problems in a sensitivity setting. For example, we show that under the Boolean Matrix Multiplication (BMM) conjecture combinatorial algorithms cannot compute the diameter of an undirected unweighted dense graph with truly subcubic preprocessing time and truly subquadratic update/query time. This result is surprising since in the static setting it is not clear whether a reduction from BMM to diameter is possible.

We further show under the BMM conjecture that many problems, such as approximate shortest paths, cannot be solved faster than by recomputation from scratch even after only one or two edge insertions. We further give a reduction from All Pairs Shortest Paths to Diameter under 1 deletion in weighted graphs. This is intriguing, as in the static setting it is a big open problem whether Diameter is as hard as APSP. We further get a nearly tight lower bound for shortest paths after two edge deletions based on the APSP conjecture. We give more lower bounds under the Strong Exponential Time Hypothesis. Many of our lower bounds also hold for static oracle data structures where no sensitivity is required.

Joint work with: Monika Henzinger, Stefan Neumann and Virginia Vassilevska Williams.