[In which we prove the “difficult part” of Cheeger’s inequality by analyzing a randomized rounding algorithm for a continuous relaxation of sparsest cut.]
We return to this month’s question: if is a -regular graph, how well can we approximate its edge expansion defined as
and its sparsest cut defined as
where is the adjacency matrix of .
We have looked at three continuous relaxations of , the spectral gap, the Leighton-Rao linear program, and the Arora-Rao-Vazirani semidefinite program.
As we saw, the spectral gap of , defined as the difference between largest and second largest eigenvalue of , can be seen as the solution to a continuous optimization problem:
It follows from the definitions that
which is the “easy direction” of Cheeger’s inequality, and the interesting thing is that is never much smaller, and it obeys
which is the difficult part of Cheeger’s inequality. When we normalize all quantities by the degree, the inequality reads as
I have taught (1) in three courses and used it in two papers, but I had never really understood it, where I consider a mathematical proof to be understood if one can see it as a series of inevitable steps. Many steps in the proofs of (1) I had read, however, looked like magic tricks with no explanation. Finally, however, I have found a way to describe the proof that makes sense to me. (I note that everything I will say in this post will be completely obvious to the experts, but I hope some non-expert will read it and find it helpful.)
We prove (1), as usual, by showing that given any such that
we can find a threshold such that the cut defined by satisfies
This not only gives us a proof of (1), but also an algorithm for finding sparse cuts when they exist: take a vector which is an eigenvector of the second eigenvalue (or simply a vector for which the Rayleigh quotient in (2) is small), sort the vertices according to the value of , and find the best cut among the “threshold cuts.” This is the “spectral partitioning” algorithm.
This means that proving (1) amounts to studying an algorithm that “rounds” the solution of a continuous relaxation to a combinatorial solution, and there is a standard pattern to such arguments in computer science: we describe a randomized rounding algorithm, study its average performance, and then argue that there is a fixed choice that is at least as good as an average one. Here, in particular, we would like to find a distribution over threshold, such that if we define as a random variable in terms of we have
and so, using linearity of expectation,
from which we see that there must be a threshold in our sample space such that (3) holds. I shall present a proof that explicitly follows this pattern. It would be nice if we could choose uniformly at random in the interval , but I don’t think it would work. (Any reader can see a counterexample? I couldn’t, but we’ll come back to it.) Instead, the following works: assuming with no loss of generality that the median of is zero, can be chosen so that is distributed uniformly in the interval . (This means that thresholds near the median are less likely to be picked than thresholds far from the median.) This choice seems to be a magic trick in itself, voiding the point I made above, but I hope it will become natural as we unfold our argument.
First, we shall see that there are continuous relaxations of and that exactly match the combinatorial optimum, and that this can be established via random rounding.
Our first claim is that
To prove (4), take an optimal solution , pick a random threshold in the interval , and define . Then it should be clear that the probability that an edge is cut by is proportional to . Normalizing so that , which does not affect the distribution of , nor the optimality of , the probability is exactly equal to , and we have
so that the support of our distribution includes a cut such that
Next, we want to argue that
where denotes the median of the values . It may take a minute to see that the right-hand side is a relaxation of . Suppose that is the optimal cut in the definition of , and that . Then define if and otherwise. The median of is zero, , and counts twice every edge crossing the cut. So the right-hand-side is a relaxation, and can be no larger than .
As before, normalize so that , pick a threshold uniformly at random in this interval, and define . We already argued that
and it remains to show that
This is just a matter of writing things down. If we name the vertices so that , and we use that for a non-negative integral random variable we have , then we just have to observe that
Now that we have (4) and (5), if someone gives us an such that
our first thought would be to plug into the right-hand side of (4) and (5) and hope for the best. Unfortunately, I suspect that such hope would be unwarranted. We can see, however, that
so, if (6) holds, we have
and, at least in the special case in which , this gives us a cut of expansion . I don’t think, however, that just using would work in the general case.
(Note that this special case in which the very simple rounding works is also the case in which the simple rounding gives, on average, a nearly balanced partition. Interestingly, both the Leighton-Rao and the Arora-Rao-Vazirani analyses reduce the general case to a case in which the respective rounding algorithms produce nearly balanced partitions.)
The next try would be to find a such that is an approximation to , and is an approximation to . My first guess was to choose , and this works!
Let us see what the right-hand sides of (4) and (5) look like for such a solution . First, we may assume without loss of generality that we are given an such that the median of is zero. (Otherwise we can subtract the median from every entry without changing (6).) This means that we have a solution that is feasible for both (4) and (5).
In order to study the numerator of (6), we need to find a way to relate to . A couple of attempts give
and it’s clear that the next step has to be Cauchy-Schwarz
So we have related the numerator of (6) to the numerators of (4) and (5), up to the extra term that is easy to bound (recall the graph is -regular) as
Now we use (6) to get
and, combining everything,
Although it is unnecessary to complete the proof, we can also see that, because the median of is zero, we have
There is another way to look at this “rounding algorithm:” we see the solution that we are given as defining a “distance function” among the vertices, and we find a combinatorial solution by finding an such that the cut is defined by the “ball of radius ” around the vertex , where . (If is the threshold picked by the algorithm, then .) Indeed, we pick a radius at random, with a certain distribution, and we are able to relate the probability that an edge is cut with the value . This view is not very natural for the spectral partitioning algorithm, but it is in the spirit of what happens in the rounding of the Leighton-Rao and the Arora-Rao-Vazirani relaxations.