*In which we talk about even more generalizations of Cheeger’s inequalities, and we analyze the power method to find approximate eigenvectors, thus having a complete description of a polynomial-time approximation algorithm for sparsest cut*

**1. Irregular Graphs **

For simplicity, we proved our results on and for regular graphs. Those results extend, essentially with the same proofs, to the case of irregular undirected graphs. In an irregular graph , the notion that generalizes edge expansion is called *conductance*. If is the degree of vertex , then the conductance of set of vertices is

We will call the sum of the degrees of a set of vertices the *volume* of the set, and denote it . The conductance of the graph is

Higher-order conductance is defined as higher-order expansion, but with conductance replacing expansion in the definition.

The Cheeger inequalities

still hold, with the same proof. With some abuse of notation, we will call the following quantity the Rayleigh quotient of

even if, technically, it is the Rayleigh quotient of , where is the diagonal matrix of degrees.

We can also adapt the proof of the higher-order Cheeger inequality to show

**2. More Cheeger-type Bounds **

We proved that if is the cut found by Fiedler’s algorithm using the eigenvector of , then

which is a good bound, although it usually underestimates the quality of the solutions found in practice. (There are graphs, however, for which the above inequality is tight within a constant factor.)

One case in which we can improve the analysis is when there are not too many eigenvalues close to

Theorem 1There is a constant such that, if is the cut obtained by Fiedler’s algorithm using an eigenvector for , then, for every ,

So we have

which is a better bound for families of graphs in which, for some , .

We will not have time to prove Theorem 1, but we will state the two main pieces of its proof.

Lemma 2Let be a non-negative vector. Then, for every , there is a non-negative vector whose entries take at most distinct values and such that

That is, if , then there are values such that most entries of are close to one of those values.

Lemma 3There is a constant such that, for every non-negative vectors and , if is such that its entris contain only distinct values, then there is a threshold such that

The above lemma should be compared to the fact, which was a major piece in the proof of Cheeger’s inequality, that if is an arbitrary non-negative vector, then there is a threshold such that

One obtains Theorem 1 in the following. Start from an eigenvector of and, using the first step in the proof of Cheeger’s inequality, obtain a vector with non-negative entries such that and such that the support of contains at most vertices.

Use Lemma 2 to find a vector with non-negative entries and with at most distinct values among its entries such that . Then use Lemma 3 and the fact that to conclude that there exists at such that

The set contains at most vertices, it is one of the cuts considered by Fiedler’s algorithm on input .

Another property of graphs in which is large for small is that they contain large expanders as induced subgraphs.

Theorem 4There is a constant such that, for every graph and every , there exists a partition of the vertices into sets such that, if we call the subgraph induced by the vertex set , we have

Theorem 5If , then there is a partition of the vertices into subsets such that

**3. The Power Method **

Earlier in this class, we showed that, if is a -regular graph, and is its normalized Laplacian matrix with eigenvalues , given an eigenvector of , Fiedler’s algorithm finds, in nearly-linear time , a cut such that .

More generally, if, instead of being given an eigenvector such that , we are given a vector such that , then the algorithm finds a cut such that . We will now see how to compute such a vector using arithmetic operations.

A symmetric matrix is *positive semi-definite* (abbreviated PSD) if all its eigenvalues are nonnegative. We begin by describing an algorithm that approximates the largest eigenvalue of a given symmetric PSD matrix. This might not seem to help very much because because we want to compute the second smallest, not the largest, eigenvalue. We will see, however, that the algorithm is easily modified to accomplish what we want.

** 3.1. The Power Method to Approximate the Largest Eigenvalue **

The algorithm works as follows

Algorithm *Power*

Input: PSD matrix , parameter

- Pick uniformly at random
- for to

- return

That is, the algorithm simply picks uniformly at random a vector with coordinates, and outputs .

Note that the algorithm performs arithmetic operations, where is the number of non-zero entries of the matrix .

Theorem 6For every PSD matrix , positive integer and parameter , with probability over the choice of , the algorithm Power outputs a vector such that

where is the largest eigenvalue of .

Note that, in particular, we can have and .

Let be the eigenvalues of , with multiplicities, and be a system of orthonormal eigenvectors such that . Theorem 6 is implied by the following two lemmas

Lemma 7Let be a vector such that . Sample uniformly . Then

Lemma 8For every , for every positive integer and positive , if we define , we have

It remains to prove the two lemmas.

*Proof:* (Of Lemma 7) Let . The inner product is the random variable

Let us compute the first, second, and fourth moment of .

Recall that the Paley-Zygmund inequality states that if is a non-negative random variable with finite variance, then, for every , we have

which follows by noting that

that

and that

We apply the Paley-Zygmund inequality to the case and , and we derive

Remark 1The proof of Lemma 7 works even if is selected according to a 4-wise independent distribution. This means that the algorithm can be derandomized in polynomial time.

*Proof:* (Of Lemma 8) Let us write as a linear combination of the eigenvectors

where the coefficients can be computed as . We have

and so

and

We need to prove a lower bound to the ratio of the above two quantities. We will compute a lower bound to the numerator and an upper bound to the denominator in terms of the same parameter.

Let be the number of eigenvalues larger than . Then, recalling that the eigenvalues are sorted in non-increasing order, we have

We also see that

So we have

giving

Remark 2Where did we use the assumption that is positive semidefinite? What happens if we apply this algorithm to the adjacency matrix of a bipartite graph?

** 3.2. Approximating the Second Largest Eigenvalue **

Suppose now that we are interested in finding the second largest eigenvalue of a given PSD matrix . If has eigenvalues , and we know the eigenvector of , then is a PSD linear map from the orthogonal space to to itself, and is the largest eigenvalue of this linear map. We can then run the previous algorithm on this linear map.

Algorithm *Power2*

Input: PSD matrix , vector parameter

- Pick uniformly at random
- for to

- return

If is an orthonormal basis of eigenvectors for the eigenvalues of , then, at the beginning, we pick a random vector

that, with probability at least , satisfies . (Cf. Lemma 7.) Then we compute , which is the projection of on the subspace orthogonal to , that is

Note that and that .

The output is the vector

If we apply Lemma 8 to subspace orthogonal to , we see that when we have that, for every ,

We have thus established the following analysis.

Theorem 9For every PSD matrix , positive integer and parameter , if is a length-1 eigenvector of the largest eigenvalue of , then with probability over the choice of , the algorithm Power2 outputs a vector such that

where is the second largest eigenvalue of , counting multiplicities.

** 3.3. The Second Smallest Eigenvalue of the Laplacian **

Finally, we come to the case in which we want to compute the second smallest eigenvalue of the normalized Laplacian matrix of a -regular graph , where is the adjacency matrix of .

Consider the matrix . Then if are the eigenvalues of , we have that

are the eigenvalues of , and that is PSD. and have the same eigenvectors, and so is a length-1 eigenvector of the largest eigenvalue of .

By running algorithm Power2, we can find a vector such that

and

so, rearranging, we have

If we want to compute a vector whose Rayleigh quotient is, say, at most , then the running time will be , because we need to set , which is not nearly linear in the size of the graph if is, say .

For a running time that is nearly linear in for all values of , one can, instead, apply the power method to the *pseudoinverse* of . (Assuming that the graph is connected, is the unique vector such that , if , and if is parallel to .) This is because has eigenvalues , and so is PSD and is its largest eigenvalue.

Although computing is not known to be doable in nearly linear time, there are nearly linear time algorithms that, given , solve in the linear system , and this is the same as computing the product , which is enough to implement algorithm *Power* applied to .

(Such algorithms will be discussed in the third part of the course. The algorithms will find an approximate solution to the linear system , but this will be sufficient. In the following, we proceed as if the solution was exact.)

In time , we can find a vector such that , where is a random vector in , shifted to be orthogonal to and . What is the Rayleigh quotient of such a vector with respect to ?

Let be a basis of orthonormal eigenvectors for and . If are the eigenvalues of , then we have

and, for , we have

Write , where , and ssume that, as happens with probability at least , we have . Then

and the Rayleigh quotient of with respect to is

and the analysis proceeds similarly to the analysis of the previous section. If we let be the index such that then we can upper bound the numerator as

and we can lower bound the denominator as

and the Rayleigh quotient is

when .

Typo in proof of Lem 8, After “So we have”:

should be “sum_{i = 1}^l a_i^2 \lambda_i^{2k}”