*[In which we prove the “difficult part” of Cheeger’s inequality by analyzing a randomized rounding algorithm for a continuous relaxation of sparsest cut.]*

We return to this month’s question: if is a -regular graph, how well can we approximate its *edge expansion* defined as

and its *sparsest cut* defined as

,

where is the adjacency matrix of .

We have looked at three continuous relaxations of , the spectral gap, the Leighton-Rao linear program, and the Arora-Rao-Vazirani semidefinite program.

As we saw, the spectral gap of , defined as the difference between largest and second largest eigenvalue of , can be seen as the solution to a continuous optimization problem:

.

It follows from the definitions that

which is the “easy direction” of Cheeger’s inequality, and the interesting thing is that is never much smaller, and it obeys

,

which is the difficult part of Cheeger’s inequality. When we normalize all quantities by the degree, the inequality reads as

.

I have taught (1) in three courses and used it in two papers, but I had never really *understood* it, where I consider a mathematical proof to be understood if one can see it as a series of inevitable steps. Many steps in the proofs of (1) I had read, however, looked like magic tricks with no explanation. Finally, however, I have found a way to describe the proof that makes sense to me. (I note that everything I will say in this post will be completely obvious to the experts, but I hope some non-expert will read it and find it helpful.)

We prove (1), as usual, by showing that given any such that

we can find a threshold such that the cut defined by satisfies

and

.

This not only gives us a proof of (1), but also an algorithm for finding sparse cuts when they exist: take a vector which is an eigenvector of the second eigenvalue (or simply a vector for which the Rayleigh quotient in (2) is small), sort the vertices according to the value of , and find the best cut among the “threshold cuts.” This is the “spectral partitioning” algorithm.

This means that proving (1) amounts to studying an algorithm that “rounds” the solution of a continuous relaxation to a combinatorial solution, and there is a standard pattern to such arguments in computer science: we describe a randomized rounding algorithm, study its average performance, and then argue that there is a fixed choice that is at least as good as an average one. Here, in particular, we would like to find a distribution over threshold, such that if we define as a random variable in terms of we have

and so, using linearity of expectation,

from which we see that there must be a threshold in our sample space such that (3) holds. I shall present a proof that explicitly follows this pattern. It would be nice if we could choose uniformly at random in the interval , but I don’t think it would work. (Any reader can see a counterexample? I couldn’t, but we’ll come back to it.) Instead, the following works: assuming with no loss of generality that the median of is zero, can be chosen so that is distributed uniformly in the interval . (This means that thresholds near the median are less likely to be picked than thresholds far from the median.) This choice seems to be a magic trick in itself, voiding the point I made above, but I hope it will become natural as we unfold our argument.

First, we shall see that there are continuous relaxations of and that exactly match the combinatorial optimum, and that this can be established via random rounding.

Our first claim is that

To prove (4), take an optimal solution , pick a random threshold in the interval , and define . Then it should be clear that the probability that an edge is cut by is proportional to . Normalizing so that , which does not affect the distribution of , nor the optimality of , the probability is exactly equal to , and we have

and

so that the support of our distribution includes a cut such that

Next, we want to argue that

where denotes the median of the values . It may take a minute to see that the right-hand side is a relaxation of . Suppose that is the optimal cut in the definition of , and that . Then define if and otherwise. The median of is zero, , and counts twice every edge crossing the cut. So the right-hand-side is a relaxation, and can be no larger than .

As before, normalize so that , pick a threshold uniformly at random in this interval, and define . We already argued that

and it remains to show that

.

This is just a matter of writing things down. If we name the vertices so that , and we use that for a non-negative integral random variable we have , then we just have to observe that

.

Now that we have (4) and (5), if someone gives us an such that

our first thought would be to plug into the right-hand side of (4) and (5) and hope for the best. Unfortunately, I suspect that such hope would be unwarranted. We can see, however, that

and

so, if (6) holds, we have

and, at least in the special case in which , this gives us a cut of expansion . I don’t think, however, that just using would work in the general case.

(Note that this special case in which the very simple rounding works is also the case in which the simple rounding gives, on average, a nearly balanced partition. Interestingly, both the Leighton-Rao and the Arora-Rao-Vazirani analyses reduce the general case to a case in which the respective rounding algorithms produce nearly balanced partitions.)

The next try would be to find a such that is an approximation to , and is an approximation to . My first guess was to choose , and this works!

Let us see what the right-hand sides of (4) and (5) look like for such a solution . First, we may assume without loss of generality that we are given an such that the median of is zero. (Otherwise we can subtract the median from every entry without changing (6).) This means that we have a solution that is feasible for both (4) and (5).

In order to study the numerator of (6), we need to find a way to relate to . A couple of attempts give

and it’s clear that the next step has to be Cauchy-Schwarz

So we have related the numerator of (6) to the numerators of (4) and (5), up to the extra term that is easy to bound (recall the graph is -regular) as

Now we use (6) to get

and, combining everything,

.

Although it is unnecessary to complete the proof, we can also see that, because the median of is zero, we have

and so

.

There is another way to look at this “rounding algorithm:” we see the solution that we are given as defining a “distance function” among the vertices, and we find a combinatorial solution by finding an such that the cut is defined by the “ball of radius ” around the vertex , where . (If is the threshold picked by the algorithm, then .) Indeed, we pick a radius at random, with a certain distribution, and we are able to relate the probability that an edge is cut with the value . This view is not very natural for the spectral partitioning algorithm, but it is in the spirit of what happens in the rounding of the Leighton-Rao and the Arora-Rao-Vazirani relaxations.

hi luca:

in addition to the wonderful post(s), one thing that might be worth mentioning is a corresponding “cheating-cheeger’s inequality” in the balanced separator world. the advantage of this is that it is straightforward to prove and the quadratic dependence comes out, perhaps, in a much cleaner way. also, it might be the way to introduce cheeger’s inequality in a course.

i am sure most of the readers of your blog have seen this, but for the others, here is a sketch of how it goes: for the sake of this discussion, assume that the graph is d-regular.

let \lambda_2 denote the normalized (by d) second smallest eigenvalue of the laplacian.

then \lambda_2 can be seen as the optimum of the following optimization problem (which arises as a relaxation to the sparsest cut problem in the natural way):

assign a vector x_i in R^n to each vertex so as to minimize the following ratio:

E_{e=ij} [||x_i-x_j||^2] / E_{i<j}[||x_i-x_j||^2]

(the ratio of average l_2^2 distance across the edges to the average

l_2^2 distance across all pairs of vertices).

on the other hand, the corresponding relaxation to the balanced separator problem can be written as:

assign a “unit” vector x_i in R^n to each vertex so as to minimize the following ratio:

E_{e=ij} [||x_i-x_j||^2]

such that E_{i<j}[||x_i-x_j||^2] = \Omega(1).

the optimum of this program is the equivalent of \lambda_2 for the balanced separator problem.

now suppose that this optima is less than \epsilon. then it is easy to show that there is a balanced cut in the graph which cuts <= O(\sqrt{\epsilon}) fraction of edges: simply pick a random hyperplane cut passing through

the origin.

since E_{e=ij} [||x_i-x_j||^2] <= \epsilon, the probability that a typical edge is cut is O(\sqrt{\epsilon}). here we use the fact that for an edge e=ij, with x_i, x_j being unit vectors with inner product 1-\epsilon/2 are separated with probability at most O(\sqrt{\epsilon}).

the fact that the cut is balanced comes from the fact that on an average two vertices are a constant distance apart.

-nisheeth

Pingback: The Cheeger-Alon-Milman inequality « tcs math - some mathematics of theoretical computer science

This is nice stuff. I hate to bug you, but I find this very hard to read. Is there any way of setting the text color to black, do you know?

Ah, reading the entry in Google Reader gives black text.

I just wrote a post that is related (see http://blogs.ethz.ch/kowalski/2011/10/17/trevisans-l1-style-cheeger-inequality/ )

Pingback: 245B, Notes 1: Basic theory of expander graphs « What’s new