Algorithms and Complexity Seminar Fall 2011

Wednesday, September 14, 2011: Speaker: Yaniv Erlich (Whitehead Institute for Biomedical Research). Topic: DNA Sudoku

Revealing the genetic factors that impact human disease susceptibility is one of the major challenges of modern biology. Accumulating lines of evidence have highlighted the pivotal role of rare genetic variations in a wide variety diseases, from hypertension through schizophrenia to autism. Rare variations in large cohorts can be thought of as a sparse signal. Thus, their detection is amenable to a Compressed Sensing (CS) protocol.

We have developed an ultra-cost effective strategy called “DNA Sudoku” to profile rare variations in large cohorts using a CS approach. As a test case, we identified rare genetic variations in a batch of 40,000 bacterial colonies by sensing only 1,900 combinatorial pools. We recovered 98.2% of the signal correctly. In the past two years, our CS approach for bacteria has been extensively used by the industry for biotechnological applications.

Adapting DNA Sudoku to human genetics poses new and exciting theoretical challenges to the ‘traditional’ compressed sensing approach. These challenges come in three categories: incorporating a-priori genetic information, technical constraints, and the small experiment problem. In my talk, I will present in detail the challenges and will call for an extended CS theory to address the challenges.

Thursday, September 22, 2011: Speaker: Reut Levi (Tel Aviv University). Topic: Testing Properties of Collections of Distributions: Equivalence and Similar Means.

We propose a framework for studying property testing of collections of distributions, where the number of distributions in the collection is a parameter of the problem. Previous work on property testing of distributions considered single distributions or pairs of distributions. We suggest two models that differ in the way the algorithm is given access to samples from the distributions. In one model the algorithm may ask for a sample from any distribution of its choice, and in the other the distributions from which it gets samples are selected randomly. Our main focus is on the basic problem of distinguishing between the case that all the distributions in the collection are the same (or very similar), and the case that it is necessary to modify the distributions in the collection in a non-negligible manner so as to obtain this property. We give almost tight upper and lower bounds for this testing problem, as well as study an extension to a clusterability property. One of our lower bounds directly implies a lower bound on testing independence of a joint distribution, a result which was left open by previous work.

In a more recent work, we consider another basic problem of testing a basic property of collections of distributions: having similar means. Namely, the algorithm should accept collections of distributions in which all distributions have means that do not differ by more than some given parameter, and should reject collections that are relatively far from having this property. We provide upper and lower bounds in both models. In particular, in the query model, the complexity of the problem is polynomial in $1/\epsilon$ (where $\epsilon$ is the given distance parameter). On the other hand, in the sampling model, the complexity grows roughly as $m^{1-\poly(1/\epsilon)}, where $m$ is the number of distributions.

Joint work with Prof. Dana Ron and Prof. Ronitt Rubinfeld.

Thursday, October 13, 2011: Speaker: Eric Blais (CMU). Topic: Property Testing with Restricted Queries

Property testing considers the following general question: on how many inputs must we query a boolean function to determine whether the function has some property P (e.g., is linear, a low-degree polynomial, a halfspace, computable by a small-size circuit, etc.) or whether it is "far" from having the property P?

Almost all results in property testing have focussed on the membership query model, where a tester has no restrictions on the queries it can make. In many potential applications of testing, however, this assumption is unrealistic. In this talk, we introduce a new model of testing with restricted queries that is analogous to the active learning model. We show that many fundamental properties of boolean functions can still be efficiently tested in this model.

This is joint work with Nina Balcan, Avrim Blum, and Liu Yang.

Thursday, October 20, 2011: Speaker: Hossein Jowhari (SFU). Topic: Near-optimal bounds for Lp samplers with application in finding duplicates in data streams

In this talk, we present near-optimal space bounds for L_p-samplers. Given a stream of updates (additions and subtraction) to the coordinates of an underlying vector x in R^n, a perfect L_p sampler outputs the i-th coordinate with probability |x_i|^p/||x||_p^p. In SODA 2010, Monemizadeh and Woodruff showed polylog space upper bounds for approximate L_p-samplers and demonstrated various applications of them. Following that, Andoni, Krauthgamer and Onak improved the upper bounds and gave a O(eps^{-p}log^3 n) space eps-relative error L_p-sampler for p in [1,2]. In this talk, we present another such algorithm requiring only O(eps^{-p}log^2n) space for p in (1,2). For p in (0,1), our space bound is O(eps^{-1}log^2n), while for the $p=1$ case we have an O(log(1/eps)eps^{-1}log^2n) space algorithm.

As an application of our samplers, we present a tight upper bound for the problem of finding duplicates in data streams. In case the length of the stream is longer than the alphabet size, our L_1 sampler gives us an O(log^2 n) space algorithm, thus improving the previous O(log^3 n) bound due to Gopalan and Radhakrishnan.

We also show an Omega(log^2 n) lower bound for sampling from 0,+-1 vectors (and finding duplicates). This matches the space of our sampling algorithms for constant eps>0. These bounds are obtained using reductions from the communication complexity problem augmented indexing.

Joint work with Mert Saglam and Gabor Tardos.

Thursday, November 14, 2011: Speaker: Siamak Tazari (MIT). Topic: Algorithmic certificates for treewidth and the parameterized intractability of monadic second-order logic

We study the complexity of checking if a graph satisfies a formula of monadic second-order logic (MSO) with edge quantification when parameterized by the length of the formula. A famous theorem of Courcelle (1990) shows that this problem is efficiently solvable, that is, fixed-parameter tractable, on graphs of bounded treewidth. This theorem encompasses and unifies most of the efficient algorithms we know on graphs of bounded treewidth. We provide first lower bounds for this theorem on classes of colored graphs and subgraph-closed classes of graphs by showing that if the treewidth of such a class is not polylogarithmically bounded, the model-checking problem of MSO is not decidable in polynomial time -- even if the exponent of the polynomial may depend on the length of the formula (modulo some complexity assumptions). To achieve this, we develop several algorithmic tools using recent graph structure and graph minor theory, which are interesting in their own right and serve as certificates for the large treewidth of a graph. Most notably, we provide polynomial-time algorithms to construct brambles, grid-like minors, and tree-ordered webs in general graphs of sufficiently high treewidth.

This is joint work with Stephan Kreutzer.

Monday, December 5, 2011 4:30 pm: Speaker: Rachit Agarwal (UIUC). Topic: Distance oracles and compact routing: Old Stories, new characters

Distance oracles are compact representations of the shortest path matrix of a graph, with applications to distance estimation and routing in large networks. Prior research focuses on designing oracles that return approximately shortest paths in constant time. In this talk, we study distance oracles with super-constant query time. We show that for the realistic case of sparse graphs, one can significantly reduce the oracle's size and approximation (for distances returned) by using slightly higher query time. In particular, we present the first distance oracle of subquadratic size that returns distances with approximation less than 2 in sublinear query time for general weighted undirected graphs.

In addition, several of our oracles can be adapted to routing schemes for large networks with guaranteed low latency and scalability. We present such compact routing schemes and evaluate them over a set of Internet topology datasets. For a large-scale map of the Internet, the schemes compute exact shortest paths for more than 99% of the source-destination pairs while reducing the number of forwarding table entries from O(n) to roughly O(n^0.5), where n is the number of nodes in the network.

We also discuss a number of intriguing problems that come up by considering higher query time, graph sparsity, and small-world networks within the framework of distance oracles and compact routing.

No prior knowledge of distance oracles and/or compact routing will be assumed. A part of this talk is based on joint work with Brighten Godfrey and Sariel Har-Peled.

Wednesday, February 8, 2012: Speaker: Andrew V. Goldberg (Microsoft Research Silicon Valley). Topic: The Hub Labeling Algorithm

This is a survey of Hub Labeling results for general and road networks.

Given a weighted graph, a distance oracle takes as an input a pair of vertices and returns the distance between them. The labeling approach to distance oracle design is to precompute a label for every vertex so that distances can be computed from the corresponding labels. This approach has been introduced by [Gavoille et al. '01], who also introduced the Hub Labeling algorithm (HL). HL has been further studied by [Cohen et al. '02].

We study HL in the context of graphs with small highway dimension (e.g., road networks). We show that under this assumption HL labels are small and the queries are sublinear. We also give an approximation algorithm for computing small HL labels that uses the fact that shortest path set systems have small VC-dimension.

Although polynomial-time, precomputation given by theory is too slow for continental-size road networks. However, heuristics guided by the theory are fast, and compute very small labels. This leads to the fastest currently known practical distance oracles for road networks.

The simplicity of HL queries allows their implementation inside of a relational database (e.g., in SQL), and query efficiency assures real-time response. Furthermore, including HL data in the database allows efficient implementation of more sophisticated location-based queries. These queries can be combined with traditional SQL queries. This approach brings the power of location-based services to SQL programmers, and benefits from external memory implementation and query optimization provided by the underlying database.

Joint work with Ittai Abraham, Daniel Delling, Amos Fiat, and Renato Werneck.

Wednesday, February 29, 2012 5:00pm: Speaker: Christian Sohler (TU Dortmund). Topic: Analyzing the structure of large graphs by looking at small subgraphs (Every property of hyperfinite graphs is testable)

The analysis of complex networks like the webgraph, social networks, metabolic networks or transportation networks is a challenging problem. One problem that has drawn a significant amount of attention is the question to classify the domain a given network belongs, i.e. whether it is, say, a social network or a metabolic network. One empirical approach to solve this problem uses the concept of network motifs. A network motif is a subgraph that appears more frequently in a certain class of graphs than in a random graph. This approach raises the theoretical question about the structural properties we can learn about a graph by looking at small subgraphs and how one can analyze graph structure by looking at random samples as these will typically contain frequent subgraphs.

Obviously, it is not possible to to analyze classical structural properties like, for example, connectivity by only looking at small subgraphs. One needs a relaxed and more robust definition of graph properties. Such a definition is given by the concept of property testing. In my talk and within the framework of property testing I will give a partial answer to the question what we can learn about graph properties from the distribution of constant sized subgraphs. I will show that every planar graph with constant maximum degree is defined up to epsilon n edges by its distribution (frequency) of subgraphs of constant size. This result implies that every property of planar graphs is testable in the property testing sense.

Joint work with Ilan Newman.

Algorithms and Complexity Seminar: Fall 2011

Organizer: Eric Price `ecprice@mit.edu`

Partial Schedule:

Archives from previous semesters:

People who attended talks like these also attended:

Algorithms and Complexity Seminar: Fall 2011

Organizer: Eric Price ecprice@mit.edu

Partial Schedule:

Archives from previous semesters:

People who attended talks like these also attended:

Organizer: Eric Price `ecprice@mit.edu`