-
Wednesday, September 14, 2011:
Speaker:
Yaniv Erlich
(Whitehead Institute for Biomedical Research).
Topic:
DNA Sudoku
Revealing the genetic factors that impact human disease
susceptibility is one of the major challenges of modern
biology. Accumulating lines of evidence have highlighted the pivotal
role of rare genetic variations in a wide variety diseases, from
hypertension through schizophrenia to autism. Rare variations in
large cohorts can be thought of as a sparse signal. Thus, their
detection is amenable to a Compressed Sensing (CS) protocol.
We have developed an ultra-cost effective strategy called “DNA
Sudoku” to profile rare variations in large cohorts using a CS
approach. As a test case, we identified rare genetic variations in a
batch of 40,000 bacterial colonies by sensing only 1,900
combinatorial pools. We recovered 98.2% of the signal correctly. In
the past two years, our CS approach for bacteria has been extensively
used by the industry for biotechnological applications.
Adapting DNA Sudoku to human genetics poses new and exciting
theoretical challenges to the ‘traditional’ compressed sensing
approach. These challenges come in three categories: incorporating
a-priori genetic information, technical constraints, and the small
experiment problem. In my talk, I will present in detail the
challenges and will call for an extended CS theory to address the
challenges.
-
Thursday, September 22, 2011:
Speaker:
Reut Levi
(Tel Aviv University).
Topic:
Testing Properties of Collections of Distributions: Equivalence and Similar Means.
We propose a framework for studying property testing of collections
of distributions, where the number of distributions in the collection
is a parameter of the problem. Previous work on property testing of
distributions considered single distributions or pairs of
distributions. We suggest two models that differ in the way the
algorithm is given access to samples from the distributions. In one
model the algorithm may ask for a sample from any distribution of its
choice, and in the other the distributions from which it gets samples
are selected randomly. Our main focus is on the basic problem of
distinguishing between the case that all the distributions in the
collection are the same (or very similar), and the case that it is
necessary to modify the distributions in the collection in a
non-negligible manner so as to obtain this property. We give almost
tight upper and lower bounds for this testing problem, as well as
study an extension to a clusterability property. One of our lower
bounds directly implies a lower bound on testing independence of a
joint distribution, a result which was left open by previous work.
In a more recent work, we consider another basic problem of testing a
basic property of collections of distributions: having similar
means. Namely, the algorithm should accept collections of
distributions in which all distributions have means that do not differ
by more than some given parameter, and should reject collections that
are relatively far from having this property. We provide upper and
lower bounds in both models. In particular, in the query model, the
complexity of the problem is polynomial in $1/\epsilon$ (where
$\epsilon$ is the given distance parameter). On the other hand, in the
sampling model, the complexity grows roughly as
$m^{1-\poly(1/\epsilon)}, where $m$ is the number of distributions.
Joint work with Prof. Dana Ron and Prof. Ronitt Rubinfeld.
-
Thursday, October 13, 2011:
Speaker:
Eric Blais
(CMU).
Topic:
Property Testing with Restricted Queries
Property testing considers the following general question: on how many inputs must we query a boolean function to determine whether the function has some property P (e.g., is linear, a low-degree polynomial, a halfspace, computable by a small-size circuit, etc.) or whether it is "far" from having the property P?
Almost all results in property testing have focussed on the membership query model, where a tester has no restrictions on the queries it can make. In many potential applications of testing, however, this assumption is unrealistic. In this talk, we introduce a new model of testing with restricted queries that is analogous to the active learning model. We show that many fundamental properties of boolean functions can still be efficiently tested in this model.
This is joint work with Nina Balcan, Avrim Blum, and Liu Yang.
-
Thursday, October 20, 2011:
Speaker:
Hossein Jowhari
(SFU).
Topic:
Near-optimal bounds for Lp samplers with application in finding duplicates in data streams
In this talk, we present near-optimal space bounds for L_p-samplers.
Given a stream of updates (additions and subtraction) to the coordinates of
an underlying vector x in R^n, a perfect L_p sampler outputs
the i-th coordinate with probability |x_i|^p/||x||_p^p. In SODA 2010,
Monemizadeh and Woodruff showed polylog space upper bounds for
approximate L_p-samplers and demonstrated various applications of them.
Following that, Andoni, Krauthgamer and Onak improved the upper
bounds and gave a O(eps^{-p}log^3 n) space eps-relative error
L_p-sampler for p in [1,2]. In this talk, we present another such
algorithm requiring only O(eps^{-p}log^2n) space for p in (1,2). For p in (0,1),
our space bound is O(eps^{-1}log^2n), while for the $p=1$ case we have
an O(log(1/eps)eps^{-1}log^2n) space algorithm.
As an application of our samplers, we present a tight upper bound
for the problem of finding duplicates in data streams. In case the length
of the stream is longer than the alphabet size, our L_1 sampler gives
us an O(log^2 n) space algorithm, thus improving the previous
O(log^3 n) bound due to Gopalan and Radhakrishnan.
We also show an Omega(log^2 n) lower bound for sampling
from 0,+-1 vectors (and finding duplicates). This matches
the space of our sampling algorithms for constant eps>0.
These bounds are obtained using reductions from the
communication complexity problem augmented indexing.
Joint work with Mert Saglam and Gabor Tardos.
-
Thursday, November 14, 2011:
Speaker:
Siamak Tazari
(MIT).
Topic:
Algorithmic certificates for treewidth and the parameterized intractability of monadic second-order logic
We study the complexity of checking if a graph satisfies a formula of
monadic second-order logic (MSO) with edge quantification when
parameterized by the length of the formula. A famous theorem of
Courcelle (1990) shows that this problem is efficiently solvable, that
is, fixed-parameter tractable, on graphs of bounded treewidth. This
theorem encompasses and unifies most of the efficient algorithms we
know on graphs of bounded treewidth. We provide first lower bounds
for this theorem on classes of colored graphs and
subgraph-closed classes of graphs by showing that if the treewidth of
such a class is not polylogarithmically bounded, the model-checking
problem of MSO is not decidable in polynomial time -- even if the
exponent of the polynomial may depend on the length of the formula
(modulo some complexity assumptions). To achieve this, we develop
several algorithmic tools using recent graph structure and graph minor
theory, which are interesting in their own right and serve as
certificates for the large treewidth of a graph. Most notably, we
provide polynomial-time algorithms to construct brambles, grid-like
minors, and tree-ordered webs in general graphs of sufficiently high
treewidth.
This is joint work with Stephan Kreutzer.
-
Monday, December 5, 2011 4:30 pm:
Speaker:
Rachit Agarwal
(UIUC).
Topic:
Distance oracles and compact routing: Old Stories, new characters
Distance oracles are compact representations of the shortest path matrix of a graph, with applications to distance estimation and routing in large networks. Prior research focuses on designing oracles that return approximately shortest paths in constant time. In this talk, we study distance oracles with super-constant query time. We show that for the realistic case of sparse graphs, one can significantly reduce the oracle's size and approximation (for distances returned) by using slightly higher query time. In particular, we present the first distance oracle of subquadratic size that returns distances with approximation less than 2 in sublinear query time for general weighted undirected graphs.
In addition, several of our oracles can be adapted to routing schemes for large networks with guaranteed low latency and scalability. We present such compact routing schemes and evaluate them over a set of Internet topology datasets. For a large-scale map of the Internet, the schemes compute exact shortest paths for more than 99% of the source-destination pairs while reducing the number of forwarding table entries from O(n) to roughly O(n^0.5), where n is the number of nodes in the network.
We also discuss a number of intriguing problems that come up by considering higher query time, graph sparsity, and small-world networks within the framework of distance oracles and compact routing.
No prior knowledge of distance oracles and/or compact routing will be assumed. A part of this talk is based on joint work with Brighten Godfrey and Sariel Har-Peled.
-
Wednesday, February 8, 2012:
Speaker:
Andrew V. Goldberg
(Microsoft Research Silicon Valley).
Topic:
The Hub Labeling Algorithm
This is a survey of Hub Labeling results for general and road networks.
Given a weighted graph, a distance oracle takes as an input a pair of vertices
and returns the distance between them. The labeling approach to distance oracle
design is to precompute a label for every vertex so that distances can be
computed from the corresponding labels. This approach has been introduced by
[Gavoille et al. '01], who also introduced the Hub Labeling algorithm (HL). HL
has been further studied by [Cohen et al. '02].
We study HL in the context of graphs with small highway dimension (e.g., road
networks). We show that under this assumption HL labels are small and the
queries are sublinear. We also give an approximation algorithm for computing
small HL labels that uses the fact that shortest path set systems have small
VC-dimension.
Although polynomial-time, precomputation given by theory is too slow for
continental-size road networks. However, heuristics guided by the theory are
fast, and compute very small labels. This leads to the fastest currently known
practical distance oracles for road networks.
The simplicity of HL queries allows their implementation inside of a relational
database (e.g., in SQL), and query efficiency assures real-time response.
Furthermore, including HL data in the database allows efficient implementation
of more sophisticated location-based queries. These queries can be combined
with traditional SQL queries. This approach brings the power of location-based
services to SQL programmers, and benefits from external memory implementation
and query optimization provided by the underlying database.
Joint work with Ittai Abraham, Daniel Delling, Amos Fiat, and Renato Werneck.
-
Wednesday, February 29, 2012 5:00pm:
Speaker:
Christian Sohler
(TU Dortmund).
Topic:
Analyzing the structure of large graphs by looking at small
subgraphs (Every property of hyperfinite graphs is testable)
The analysis of complex networks like the webgraph, social networks,
metabolic networks or transportation networks is a challenging problem.
One problem that has drawn a significant amount of attention is the
question to classify the domain a given network belongs, i.e. whether it
is, say, a social network or a metabolic network.
One empirical approach to solve this problem uses the concept of network
motifs. A network motif is a subgraph that appears more frequently in a
certain class of graphs than in a random graph.
This approach raises the theoretical question about the structural
properties we can learn about a graph by looking at small subgraphs and
how one can analyze graph structure by looking at random samples as these
will typically contain frequent subgraphs.
Obviously, it is not possible to to analyze classical structural
properties like, for example, connectivity by only looking at small
subgraphs. One needs a relaxed and more robust definition of graph
properties. Such a definition is given by the concept of property testing.
In my talk and within the framework of property testing I will give a
partial answer to the question what we can learn about graph properties
from the distribution of constant sized subgraphs. I will show that every
planar graph with constant maximum degree is defined up to epsilon n edges
by its distribution (frequency) of subgraphs of constant size. This result
implies that every property of planar graphs is testable in the property
testing sense.
Joint work with Ilan Newman.