You are currently browsing the category archive for the 'theory' category.
In the Max Cut problem, we are given an undirected graph and we want to find a partition
of the set of vertices such that as many edges as possible have one endpoint in
and one endpoint in
, and are hence cut by the partition.
It is easy, as recognized since the 1970s, to find a partition that cuts half of the edges and that, thus, is at least half as good as an optimal solution. No approximation better than 1/2 was known for this problem, until Goemans and Williamson famously proved that one could achieve a .878… approximation using semidefinite programming (SDP).
No other approximation algorithm achieving an approximation asymptotically better than 1/2 is known, and it seems that a fundamental difficulty is the following. Suppose we prove that a certain algorithm achieves approximation . Then, given a graph in which the optimum is, say,
, the algorithm and its analysis must provide a certificate that the optimum cut in the given graph is
, and there is no general technique to prove upper bounds to the Max Cut optimum of a general graph other than Semidefinite Programming. (And see here and here for negative results showing that large classes of linear programming relaxations are unable to give such certificates.)
Spectral techniques can prove upper bounds to Max Cut in certain cases (and can be seen as special cases of the upper bounds provided by the Goemans-Williamson relaxation).
In the simplified case in which is a
-regular graph, let
be the adjacency matrix of
and
be the eigenvalues of
; then it is easy to show that
where is the fraction of edges cut by an optimal solution. Unfortunately (1) does not quite have an approximate converse: there are graphs where
but
.
The following fact, however, is always true and well known:
-
if and only if
contains a bipartite connected component.
Is there an “approximate” version of the above statement characterizing the cases in which is small? Surprisingly, as far as I know the question had not been considered before.
For comparison, the starting point of the theory of edge expansion is the related fact
-
if and only if
is disconnected.
Which can be rephrased as:
if and only if there is a non-empty
,
such that
.
Cheeger’s inequality characterizes the case in which is small:
- If there is a non-empty
,
such that
, then
;
- If
then there is a non-empty
,
such that
.
For a subset , and a bipartition
of
, we say that an edge
fails [to be cut by] the bipartition if
is incident on
but it is not the case that one endpoint is in
and one endpoint is in
. (This means that either both endpoints are in
, or both endpoints are in
, or one endpoint is in
and one endpoint is not in
.) Then we can express the well-known fact about
as
-
if and only if there is
and a bipartition of
with zero failed edges.
In this new paper I prove the following approximate version
- If there is a non-empty
, and a bipartition of
with at most
failed edges, then
;
- If
, then there is a non-empty
, and a partition of
with at most
failed edges.
The following notation makes the similarity with Cheeger’s inequality clearer. Define the edge expansion of a graph as
Let us define the bipartiteness ratio of as
that is, as the minimum ratio between failed edges of a partition of a set over
.
Then Cheeger’s inequality gives
and our results give
This translates into an efficient algorithm that, given a graph such that
, finds a set
and a bipartition of
such that at least a
fraction of the edges incident on
are cut by the bipartition. Removing the vertices in
and continuing recursively on the residual graph yields a .50769… approximation algorithm for Max Cut. (The algorithm stops making recursive calls, and uses a random partition, when the partition of
found by the algorithm has too many failed edges.)
The paper is entirely a by-product of the ongoing series of posts on edge expansion: the question of relations between spectral techniques to max cut was asked by a commenter and the probabilistic view of the proof of Cheeger’s inequality that I wrote up in this post was very helpful in understanding the gap between and
.
Green, Tao and Ziegler, in their works on patterns in the primes, prove a general result of the following form: if is a set,
is a, possibly very sparse, “pseudorandom” susbset of
, and
is a dense subset of
, then
may be “modeled” by a large set
which has the same density in
as the density of
in
.
They use this result with being the integers in a large interval
,
being the “almost-primes” in
(integers with no small factor), and
being the primes in
. Since the almost-primes can be proved to be “pseudorandom” in a fairly strong sense, and since the density of the primes in the almost-primes is at least an absolute constant, it follows that the primes are “indistinguishable” from a large set
containing a constant fraction of all integers. Since such large sets are known to contain arbitrarily long arithmetic progressions, as proved by Szemeredi, Green and Tao are able to prove that the primes too must contain arbitrarily long arithmetic progressions. Such large sets are also known to contain arbitrarily long “polynomial progressions,” as proved by Bergelson and Leibman, and this allows Tao and Ziegler to argue that the primes too much contain arbitrarily long polynomial progressions.
(The above account is not completely accurate, but it is not lying too much.)
As announced last October here, and here, Omer Reingold, Madhur Tulsiani, Salil Vadhan and I found a new proof of this “dense model” theorem, which uses the min-max theorem of game theory (or, depending on the language that you prefer to use, the duality of linear programming or the Hahn Banach theorem) and was inspired by Nisan’s proof of the Impagliazzo hard-core set theorem. In complexity-theoretic applications of the theorem, our reduction has polynomial complexity, while the previous work incurred an exponential loss.
As discussed here and here, we also show how to use the Green-Tao-Ziegler techniques of “iterative partitioning” to give a different proof of Impagliazzo’s theorem.
After long procrastination, we recently wrote up a paper about these results.
In the Fall, we received some feedback from additive combinatorialists that while our proof of the Green-Tao-Ziegler result was technically simpler than the original one, the language we used was hard to follow. (That’s easy to believe, because it took us a while to understand the language in which the original proof was written.) We then wrote an expository note of the proof in the analyst’s language. When we were about to release the paper and the note, we were contacted by Tim Gowers, who, last Summer, had independently discovered a proof of the Green-Tao-Ziegler results via the Hahn-Banach theorem, essentially with the same argument. (He also found other applications of the technique in additive combinatorics. The issue of polynomial complexity, which does not arise in his applications, is not considered.)
Gowers announced his results in April at a talk at the Fields institute in Toronto. (Audio and slides are available online.)
Gowers’ paper already contains the proof presented in the “analyst language,” making our expository note not so useful any more; we have still posted it anyways because, by explaining how one translates from one notation to the other, it can be a short starting point for the computer scientist who is interested in trying to read Gowers’ paper, or for the combinatorialist who is interested in trying to read our paper.
In which we spend more than sixteen hundred words to explain a three-line proof.
In the last post, we left off with the following problem. We have a set of
“vertices,” a semi-metric
, and we want to find a distribution over sets
such that for every two vertices
where
This will give us a way to round a solution of the Leighton-Rao relaxation to an actual cut with only an loss in the approximation.
Before getting to the distribution which will do the trick, it is helpful to consider a few examples.
- Example 1: all points are at distance 1 from each other.
Then
is equal to either 0 or 1, and it is 1 if and only if
contains exactly one of
or
. If
is a uniformly chosen random set, then the above condition is satisfied with probability
, so we have the stronger bound
[Indeed, even better, we have
, which is an isometric embedding.]
- Example 2: all points are at distance either 1 or 2 from each other.
If
contains exactly one of the vertices
, then
, and so, if we choose
uniformly at random we have
These examples may trick us into thinking that a uniformly chosen random set always work, but this unfortunately is not the case.
- Example 3: Within a set
of size
, all distances are 1, and the same is true within
; the distance between elements of
and elements of
is
.
If we consider
and
, then we are in trouble whenever
contains elements from both sets, because then
, while
. If we pick
uniformly at random, then
will essentially always, except with exponentially small probability, contain elements from both
and
. If, however, we pick
to be a random set of size 1, then we are going to get
with probability at last
, which is great.
Choosing a set of size 1, however, is a disaster inside
and inside
, where almost all distances
collapse to zero. For those pairs, however, we know that choosing
uniformly at random works well.
The solution is thus: with probability 1/2, pick
uniformly at random; with probability 1/2, pick
of size 1.
So far we are actually getting away with being a constant fraction of
. Here is a slightly trickier case.
- Example 4: The shortest path metric in a
grid.
Take two vertices
at distance
. We can get, say,
, provided that
avoids all vertices at distance
from
, and it includes some vertex at distance
from
. In a grid, the number of vertices at distance
from a given vertex is
, so our goal is to pick
so that it avoids a certain set of size
and it hits another set of size
. If we pick
to be a random set of size about
, both events hold with constant probability.
Now, what works for a certain distance won’t work for a different distance, so it seems we have to do something like picking
from
to
, and then pick a random set
of size
. This is however too bad, because our chance of getting a set of the right size would only be
, while we can only lose a factor
. The solution is to pick
at random from
, and then pick
of size
. With probability
we get the right size of
, up to a factor of two.
It turns out that the last example gives a distribution that works in all cases:
- Pick
at random in
- Pick a random set
so that each
is selected to be in
independently and with probability
Now, it would be nice to show that (as in the examples we have seen so far) for every semi-metric and two vertices
, there is a size parameter
such that when
is chosen to be a random set of size
we have
.
This would mean that, after we lose a factor of to “guess” the right density, we have the desired bound (3). Unfortunately this is too much to ask for; we shall instead work out an argument that uses contributions from all densities.
It is good to see one more example.
- Example 5: A 3-regular graph of logarithmic girth.
Let
be two vertices whose distance
is less than the girth. In the example of the grid, we considered all vertices at distance
from
and all vertices at distance
from
; in this case, there are
of the former, and
of the latter, and it is hopeless to expect that
, no matter its density, can avoid all of the former, but hit some of the latter.
If, however, I consider the points at distance
from
and the points at distance
from
, they are off only by a constant factor, and there is a constant probability of avoiding the former and hitting the latter when
. So conditioned on each
between
and
, the expectation of
is at least
, and, overall, the expectation is at least
.
We are now more or less ready to tackle the general case.
We look at two vertices at distance
and we want to estimate
.
Let us estimate the contribution to the expectation coming from the case in which we choose a particular value of . In such a case, there is a constant probability that
-
contains none of the
vertices closest to
, but at least one of the of
vertices closest to
. [Assuming the two sets are disjoint]
-
contains none of the
vertices closest to
, but at least one of the of
vertices closest to
. [Assuming the two sets are disjoint]
Notice that events (1) and (2) are disjoint, so we are allowed to sum their contributions to the expectation, without doing any double-counting.
Call the distance of the the
-th closest vertex from
, and similarly
. Then, if event (1) happens,
and
, in which case
we can similarly argue that if (2) happens, then
Call and
.
Let be the smallest
such that either
or . Then for
the events described in (1) and (2) are well-defined, and the contributions to the expectation is at least
When , we can verify that the contribution to the expectation is at least
And if we sum the contributions for , the sum telescopes and we are left with
where .
At long last, we have completed the proof.
Notice that the factor we have lost is best possible in light of the expander example we saw in the previous post. In many examples, however, we lost only a constant factor. It is a great open question whether it is possible to lose only a constant factor whenever the metric is a shortest-path metric on a planar graph.
Continuing our seemingly never-ending series on approximations to edge expansion, we come to the question of how good is the Leighton-Rao relaxation.
Given a graph , we are trying to approximate the sparsest cut problem (which, in turn, approximates the edge expansion problem within a factor of two) defined as
And we have seen that an equivalent characterization of sparsest cut is:
In the Leighton-Rao relaxation, we have
where is the set of all semi-metrics
, that is, of all functions such that
,
, and that obey the “triangle inequality”
for all .
This is clearly a relaxation of the sparsest cut problem, because for every the “distances”
define indeed a semi-metric over
.
How bad can the relaxation be? Take a family of regular constant-degree expander graphs, that is graphs where is a constant and the degree
is a constant independent of
. Define
to be the shortest path distance between
and
in the graph. This is clearly a semi-metric, and we have
because along every edge the distance is precisely one, and
because, for every vertex , at most
other vertices can be within
distance , and so at least half the vertices are at distance
.
So we have
even though , so the error in the approximation can be as bad as a factor of
. We shall prove that this is as bad as it gets.
Our task is then: given a feasible solution to (2), that is, a distance function defined on the vertices of , find a feasible solution to sparsest cut whose cost is as small as possible, and no more than a
factor off from the cost of the solution to (2) that we started from. Instead of directly looking for a cut, we shall look for a solution to (1), which is a more flexible formulation.
The general strategy will be, given , to find a distribution
over vectors
such that for every two vertices
we have
Suppose that we start from a solution to (2) of cost , that is, we have a semimetric
such that
Then our distribution is such that
and
so
and there must exist a vector in the support of the distribution
such that
and now we are done because the cost of in (1) is at most
times the optimum of the Leighton-Rao relaxation.
Though this works, it seems an overkill to require condition (3) for all pairs of vertices. It seems sufficient to require
only for edges, and to require
only “on average” over all pairs of vertices. It can be shown, however, that if one wants to round a generalization of the Leighton-Rao relaxation to sparsest cut with “non-uniform demands,” then any rounding algorithm can be turned into a rounding algorithm that satisfies (3).
So how do we start from an arbitrary metric and come up with a probabilistic linear order of the vertices so that their distances in the original metric are well approximated in the linear ordering? We will describe a method of Bourgain, whose applicability in this setting is due Linial, London and Rabinovich.
The starting point is to see that if is a set of vertices, and we define
then, no matter how we choose , we always have
[Hint: use the triangle inequality to see that .]
This means that “all we have to do” is to find a distribution over sets such that for every
we have
If you want to make it sound more high-brow, you may call such a distribution a Frechet embedding of into
. Constructing such a distribution will be the subject of the next post.
We continue to talk about the problem of estimating the expansion of a graph , focusing on the closely related sparsest cut, defined as
The spectral paritioning algorithm first finds a vector minimizing
(where is the adjacency matrix of
) and then finds the best cut
where
is of the form
for a threshold
.
We proved that if the quantity in (1) is and
is
-regular, then the algorithm will find a cut of sparsity at most
, and that if
is the eigenvector of the second eigenvalue, then it is an optimal solution to (1), and the cost of an optimal solution to (1) is a lower bound to
. This means that the algorithm finds a cut of sparsity at most
.
Can the analysis be improved? Read the rest of this entry »
I am worried that, lately, I am agreeing too much with James Lee. I hope my friends will stage an intervention if I start saying things like “…this group acts by isometries on Hilbert space…”
I agree with his choice of favorite STOC’08 talks, but also with his comments on the statement on the importance of conceptual contributions in theoretical computer science, which was recently written by a group of distinguished theoreticians, was briefly discussed at the STOC’08 business meeting, and was commented upon here, here and here.
In Mr. Spritz Goes to Washington, Lisa and Homer help pass a bill to divert flights from over their house by tacking their air traffic bill on a “giving orphans American flags” bill.
I felt that the statement (and the subsequent public and private discussions) similarly mixes the unobjectionable with the potentially controversial. There have been two themes that are entirely agreeable.
One is the importance, and value, of simplicity in proofs. This goes without saying, and I think that the community does a good job in recognizing simple resolutions of long-standing open questions, as well as new simplified proofs of known results. In fact, I think it would be hard to find someone who suggests that, all other things being equal, a harder proof is preferable to a simple proof.
Another is the importance of introducing new concepts and models. Indeed, there would be no theoretical computer science without its “concepts,” and the field would die if it would stop innovating. Again, nobody could possibly argue against the importance of new definitions and models.
Then, there is something which is said in the statement in a way that is not self-evident:
Once understood, conceptual aspects tend to be viewed as obvious, which actually means that they have become fully incorporated in the worldview of the expert. This positive effect is actually a source of trouble in the evaluation process, because the evaluators forget that these contributions were not obvious at all before being made.
Indeed, our community should be warned of dismissing such contributions by saying “yes, but that’s obvious”; when somebody says such a thing, one should ask “was it obvious to you before reading this article?”
I think (and I echo things that were better said in comments that I linked to above) that if something looks obvious after reading a new paper for the first time (as opposed to looking back at a classical paper from twenty or more years ago), then the paper is not making a valuable conceptual contribution. The time that it takes to read a paper cannot be sufficient to “alter the worldview” of the reader: the concept must have been “in the air.”
To be sure, usually even the greatest discoveries are in the air when they are made, a point which I want to write about in a different post. The fundamental difference, however, is that reading about a good conceptual discovery for the first time is startling and exciting, like hearing a good joke for the first time. If an expert feels nothing when reading about a conceptual discovery, it is usually a very bad sign, although one can come up with various exceptions.
Often quoted exceptions are some early papers in the great foundations-of-cryptography revolution of 1982, especially the Goldwasser-Micali-Rackoff paper on Zero Knowledge, which ended up appearing in 1985. What was special about that line of work is that it was not in the air, but, rather, ahead of its time. Obviously, I wasn’t there, so I don’t know, but I guess that people at the time were not saying “this work is obvious,” but rather “what the hell is this?,” which is quite different.
A final remark is that when a paper presents a new model or problem (and I understand that these are the kind of papers that the statement refers to), there can be no “intrinsic” value attributed the model or the problem; the value will be in the understanding and general progress that will come by studying the model and finding solutions to the problem, and how this will connect with different problems, different models, and applications.
There seem to be only two ways to validate a new conceptual proposal: one is to present preliminary evidence that one can do interesting things with it. (I think, in this case, too much emphasis is put on achieving quantitative improvement; I am impressed enough to see known, hard, results, recovered in a completely different way.) The other is the “gut feeling” of the expert, which feels that the new ideas are the “right” way to think about the issue at hand.
It seems right to reject a paper that fails both tests.
[In which we prove the "difficult part" of Cheeger's inequality by analyzing a randomized rounding algorithm for a continuous relaxation of sparsest cut.]
We return to this month’s question: if is a
-regular graph, how well can we approximate its edge expansion
defined as
and its sparsest cut defined as
,
where is the adjacency matrix of
.
We have looked at three continuous relaxations of , the spectral gap, the Leighton-Rao linear program, and the Arora-Rao-Vazirani semidefinite program.
As we saw, the spectral gap of , defined as the difference between largest and second largest eigenvalue of
, can be seen as the solution to a continuous optimization problem:
.
It follows from the definitions that
which is the “easy direction” of Cheeger’s inequality, and the interesting thing is that is never much smaller, and it obeys
,
which is the difficult part of Cheeger’s inequality. When we normalize all quantities by the degree, the inequality reads as
.
I have taught (1) in three courses and used it in two papers, but I had never really understood it, where I consider a mathematical proof to be understood if one can see it as a series of inevitable steps. Many steps in the proofs of (1) I had read, however, looked like magic tricks with no explanation. Finally, however, I have found a way to describe the proof that makes sense to me. (I note that everything I will say in this post will be completely obvious to the experts, but I hope some non-expert will read it and find it helpful.)
We prove (1), as usual, by showing that given any such that
we can find a threshold such that the cut
defined by
satisfies
and
.
This not only gives us a proof of (1), but also an algorithm for finding sparse cuts when they exist: take a vector which is an eigenvector of the second eigenvalue (or simply a vector for which the Rayleigh quotient in (2) is small), sort the vertices
according to the value of
, and find the best cut among the “threshold cuts.” This is the “spectral partitioning” algorithm.
This means that proving (1) amounts to studying an algorithm that “rounds” the solution of a continuous relaxation to a combinatorial solution, and there is a standard pattern to such arguments in computer science: we describe a randomized rounding algorithm, study its average performance, and then argue that there is a fixed choice that is at least as good as an average one. Here, in particular, we would like to find a distribution over threshold, such that if we define
as a random variable in terms of
we have
and so, using linearity of expectation,
from which we see that there must be a threshold in our sample space such that (3) holds. I shall present a proof that explicitly follows this pattern. It would be nice if we could choose uniformly at random in the interval
, but I don’t think it would work. (Any reader can see a counterexample? I couldn’t, but we’ll come back to it.) Instead, the following works: assuming with no loss of generality that the median of
is zero,
can be chosen so that
is distributed uniformly in the interval
. (This means that thresholds near the median are less likely to be picked than thresholds far from the median.) This choice seems to be a magic trick in itself, voiding the point I made above, but I hope it will become natural as we unfold our argument. Read the rest of this entry »
[In which we finally get to Leighton-Rao and ARV.]
Continuing our edge expansion marathon, we are going to see more ways in which the sparsest cut problem can be relaxed to polynomial-time solvable continuous problems. As before, our goal is to understand how this implies certificates of expansion.
We are talking about the sparsest cut of a graph
, which is
We have looked for a while at the eigenvalues gap of . If
is
-regular and the second largest eigenvalue of
is
, then the difference between largest and second largest eigenvalue obeys
and, noting that when
, we saw that
is a relaxation of
, and
.
Leighton and Rao derive a continuous relaxation of in quite a different way. They note that when
, then the “distance”
induces a semi-metric on , that is,
,
, and the triangle inequality holds
One can then consider the relaxation
where the minimum is taken over all distance functions that define a semi-metric on
.
We can rewrite (1) as the linear program
where we only write the triangle inequality constraints by having one variable for every unordered pair
.
The minimum of (2) gives a lower bound to the sparsest cut of , and so every feasible solution for the dual of (2) gives a lower bound to
.
Before trying to write down the dual of (2), it is convenient to write (2) in a less efficient way. (This is akin to the fact that sometimes, when using induction, it is easier to prove a more general statement.) Instead of writing the triangle inequality for every three vertices, we write it for every arbitrary sequence of vertices.
where we denote by the set of “paths” in the complete graph between
and
. (That is,
is the set of all sequences of vertices that start at
and end at
.)
Clearly (3) is the same as (2), in the sense that the extra inequalities present in (3) are implied by the triangle inequalities in (2). The dual of (3), however, is cleaner, and is given below
A feasible solution to (4) is thus a weighted set of paths and a parameter . Before reasoning about the meaning of a solution, suppose that
satisfy the stronger constraints
Then we can view the as defining a flow between
that carries at least
units of flow; furthermore, the union of all such flows uses at most a total capacity
on each edge
. We should think of it as an embedding of the complete graph scaled by
into
. I want to claim that this means that the sparsest cut of
is at least
times the sparsest cut of the complete graph, that is, at least
.
To verify the claim, take any cut . We know that such a cut separates
pairs of vertices
, and that at least
units of flow are routed between each such pair, so that at least
units of flow cross the cut. Since every edge has capacity 1, we see that at least
edges of
must cross the cut.
It remains to bridge the difference between (4) or (5). One possibility is to show that a solution to (4) can be modified to a solution satisfying (5) with no loss in . Another approach is to directly use (4) and a general weak duality argument to show directly that a solution satisfying (4) implies that
has at least the edge expansion of a complete graph with edge weights
.
There is, in fact, a broader principle at work, which we can verify by a similar argument: if is an assignment of weights to all possible paths (which we think of as defining a multicommodity flow),
is a graph with adjacency matrix
,
is a graph with adjacency matrix
, and we have
then we must have .
The fact that a feasible solution to (4) gives a lower bound to is a special case that applies to the case in which
is a scaled version of the complete graph.
This is a good point to pause, look back again at the bound coming from spectral considerations, and compare it with what we have here. Read the rest of this entry »
In the previous post, we saw that the sparsest cut of a graph (which is within a factor of 2 of the edge expansion of the graph), is defined as
and can also be written as
while the eigenvalue gap of , can be written as
