*In which we prove the difficult direction of Cheeger’s inequality.*

As in the past lectures, consider an undirected -regular graph , call its adjacency matrix, and its scaled adjacency matrix. Let be the eigenvalues of , with multiplicities, in non-increasing order. We have been studying the *edge expansion* of a graph, which is the minimum of over all non-trivial cuts of the vertex set (a cut is trivial if or ), where the expansion of a cut is

We have also been studying the (uniform) *sparsest cut* problem, which is the problem of finding the non-trivial cut that minimizes , where the sparsisty of a cut is

We are proving Cheeger’s inequalities:

and we established the left-hand side inequality in the previous lecture, showing that the quantity can be seen as the optimum of a continuous relaxation of , so that , and follows by the definition.

Today we prove the more difficult, and interesting, direction. The proof will be constructive and algorithmic. The proof can be seen as an analysis of the following algorithm.

Algorithm: *SpectralPartitioning*

- Input: graph and vector
- Sort the vertices of in non-decreasing order of values of entries in , that is let where
- Let be such that is minimal
- Output

We note that the algorithm can be implemented to run in time , assuming arithmetic operations and comparisons take constant time, because once we have computed it only takes time to compute .

We have the following analysis of the quality of the solution:

Lemma 1 (Analysis of Spectral Partitioning)Let be a d-regular graph, be a vector such that , let be the normalized adjacency matrix of , defineand let be the output of algorithm

SpectralPartitioningon input and . Then

Remark 1If we apply the lemma to the case in which is an eigenvector of , then , and so we have

which is the difficult direction of Cheeger’s inequalities.

Remark 2If we run the SpectralPartitioning algorithm with the eigenvector of the second eigenvalue , we find a set whose expansion isEven though this doesn’t give a constant-factor approximation to the edge expansion, it gives a very efficient, and non-trivial, approximation.

As we will see in a later lecture, there is a nearly linear time algorithm that finds a vector for which the expression in the lemma is very close to , so, overall, for any graph we can find a cut of expansion in nearly linear time.

**1. Proof of the Lemma **

In the past lecture, we saw that can be seen as the optimum of a continuous relaxation of sparsest cut. Lemma 1 provides a *rounding algorithm* for the real vectors which are solutions of the relaxation. In this section we will think of it as a form of *randomized rounding*. Later, when we talk about the Leighton-Rao sparsest cut algorithm, we will revisit this proof and think of it in terms of *metric embeddings*.

To simplify notation, we will assume that and that . Thus our goal is to prove that there is an such that

We will derive Lemma 1 by showing that there is a distribution over sets of the form such that

We need to be a bit careful in deriving the Lemma from (2). In general, it is not true that a ratio of averages is equal to the average of the ratios, so (2) does not imply that . We can, however, apply linearity of expectation and derive from (2) the inequality

So there must exist a set in the sample space such that

meaning that, for that set , we have . (Basically we are using the fact that, for random variables over the same sample space, although it might not be true that , we always have , provided that over the entire sample space.)

From now on, we will assume that

- , that is, the median of the entries of is zero

which can be done without loss of generality because adding a fixed constant to all entries of , or multiplying all the entries by a fixed constant does not change the value of nor does it change the property that . The reason for these choices is that they allow us to define a distribution over sets such that

We define the distribution over sets of the form , , as the outcome of the following probabilistic process:

- We pick a real value in the range with probabily density function . That is, for , .
Doing the calculation, this means that if have the same sign, and if they have different signs.

- We let

According to this definition, the probability that an element belongs to the smallest of the sets is the same as the probability that it belongs to , which is the probability that the threshold is in the range , and that probability is . Similarly, the probability that an element belongs to the smallest of is the same as the probability that it belongs to , which is the probability that is in the range , which is again . So we have established (3).

We will now estimate the expected number of edges between and .

The event that the edge is cut by the partition happens when the value falls in the range between and . This means that

- If have the same sign,
- If have different sign,

Some attempts, show that a good expression to upper bound both cases is

Plugging into our expression for the expected number of cut edges, and applying Cauchy-Schwarz

The assumption of the Lemma tell us that

And we can rewrite

which gives us

Finally, it remains to study the expression . By applying the inequality (which follows by noting that ), we derive

Putting all the pieces together we have

which, together with (3) gives (2), which, as we already discussed, implies the Main Lemma 1.

## 4 comments

Comments feed for this article

May 2, 2012 at 8:13 pm

Cheeger Inequality and Spectral Clustering « Blog do Bandeira[...] the reader to take a look at a very nice proof that can be found in Luca Trevisan’s blog: in theory. Very roughly, the idea is to perform random rounding (with a very clever probability distribution) [...]

May 7, 2012 at 3:33 am

Lecture 8 « Pseudorandomness and Derandomization[...] now, you can find the proof in Luca Trevisan’s blog, In Theory: The easy direction and the harder one. There are other [...]

May 21, 2012 at 3:25 am

Lecture 9 « Pseudorandomness and Derandomization[...] we proved Cheeger’s inequality, based on a proof by Luca Trevisan (here and here). Quite recently two papers were published, that study what can we say when the largest [...]

June 4, 2014 at 6:06 pm

JohnA small typo: You want to shift the entries so that x_{\lceil n/2 \rceil} = 0, i.e., the ceiling of n/2 not the floor of n/2.

(For example, if n=3, your shift convention gives x_1 = 0, and your statement that Pr[1 belongs to the smaller of the sets S, V-S] = x_1^2 is false; the probability is actually x_2^2. Everything works using the ceiling instead of the floor.)