In which we analyze a nearly-linear time algorithm for finding an approximate eigenvector for the second eigenvalue of a graph adjacency matrix, to be used in the spectral partitioning algorithm.
In past lectures, we showed that, if is a -regular graph, and is its normalized adjacency matrix with eigenvalues , given an eigenvector of , the algorithm SpectralPartition finds, in nearly-linear time , a cut such that .
More generally, if, instead of being given an eigenvector such that , we are given a vector such that , then the algorithm finds a cut such that . In this lecture we describe and analyze an algorithm that computes such a vector using arithmetic operations.
A symmetric matrix is positive semi-definite (abbreviated PSD) if all its eigenvalues are nonnegative. We begin by describing an algorithm that approximates the largest eigenvalue of a given symmetric PSD matrix. This might not seem to help very much because the adjacency matrix of a graph is not PSD, and because we want to compute the second largest, not the largest, eigenvalue. We will see, however, that the algorithm is easily modified to approximate the second eigenvalue of a PSD matrix (if an eigenvector of the first eigenvalue is known), and that the adjacency matrix of a graph can easily be modified to be PSD.
1. The Power Method to Approximate the Largest Eigenvalue
The algorithm works as follows
- Input: PSD symmetric matrix , positive integer
- Pick uniformly at random
- for to
That is, the algorithm simply picks uniformly at random a vector with coordinates, and outputs .
Note that the algorithm performs arithmetic operations, where is the number of non-zero entries of the matrix .
Theorem 1 For every PSD matrix , positive integer and parameter , with probability over the choice of , the algorithm Power outputs a vector such that
where is the largest eigenvalue of .
Note that, in particular, we can have and .
Let be the eigenvalues of , with multiplicities, and be a system of orthonormal eigenvectors such that . Theorem 1 is implied by the following two lemmas
Lemma 2 Let be a vector such that . Sample uniformly . Then
Lemma 3 Let be a vector such that . Then, for every positive integer and positive , if we define , we have
It remains to prove the two lemmas.
Proof: (Of Lemma 2) Let . The inner product is the random variable
Let us compute the first, second, and fourth moment of .
Recall that the Paley-Zygmund inequality states that if is a non-negative random variable with finite variance, then, for every , we have
which follows by noting that
We apply the Paley-Zygmund inequality to the case and , and we derive
Remark 1 The proof of Lemma 2 works even if is selected according to a 4-wise independent distribution. This means that the algorithm can be derandomized in polynomial time.
Proof: (Of Lemma 3) Let us write as a linear combination of the eigenvectors
where the coefficients can be computed as . Note that, by assumption, , and that, by orthonormality of the eigenvectors, .
We need to prove a lower bound to the ratio of the above two quantities. We will compute a lower bound to the numerator and an upper bound to the denominator in terms of the same parameter.
Let be the number of eigenvalues larger than . Then, recalling that the eigenvalues are sorted in non-increasing order, we have
We also see that
So we have
Remark 2 Where did we use the assumption that is positive semidefinite? What happens if we apply this algorithm to the adjacency matrix of a bipartite graph?
2. Approximating the Second Eigenvalue
If is a PSD matrix, and if we know a unit-length eigenvector of the largest eigenvalue of , we can approximately find the second eigenvalue with the following adaption of the algorithm from the previous section.
- Input: PSD symmetric matrix , positive integer , vector
- Pick uniformly at random
- for to
If is an orthonormal basis of eigenvectors for the eigenvalues of , then, at the beginning, we pick a random vector
that, with probability at least , satisfies . (Cf. Lemma 2.) Then we compute , which is the projection of on the subspace orthogonal to , that is
Note that and that .
The output is the vector
If we apply Lemma 3 to subspace orthogonal to , we see that when we have that, for every ,
We have thus established the following analysis.
Theorem 4 For every PSD matrix , positive integer and parameter , if is a length-1 eigenvector of the largest eigenvalue of , then with probability over the choice of , the algorithm Power2 outputs a vector such that
where is the second largest eigenvalue of , counting multiplicities.
Finally, we come to the case in which is the normalized adjacency matrix of a regular graph.
We know that has eigenvalues and that is an eigenvector of .
Consider now the matrix . Every eigenvector of with eigenvalue is clearly also an eigenvector of with eigenvalue , and vice versa, thus has eigenvalues and it is PSD.
This means that we can run algorithm Power2 on the matrix using the vector and a parameter . The algorithm finds, with probability , a vector such that
which is equivalent to
What is a good reference for this material? I expect that the complexity of something so basic as eigenvalue computation has been analyzed in the literature extensively.
The problem has an enormous body of literature in the numerical analysis community, but I am not very familiar with that literature.
The 1992 paper Estimating the largest eigenvalues by the power and Lanczos algorithms with a random start by Kuczyński and Woźniakowski shows that, to compute the largest eigenvalue of a PSD matrix with relative error , iterations of the power method are enough (this is better than the bound that I prove in the post because I only look at one scale of instead of averaging over all scales); they have an explicit bound on the leading constant, and they show matrices for which there is a lower bound.
For the Lanczos algorithm, which is known to perform better in practice, they prove that iterations are sufficient.
An application of the Spielman-Teng solver for diagonally-dominated systems of linear equations is that, for the normalized adjacency matrix of a graph, one can find a constant-factor approximation of in time , independent of the value of , while the approach described in this post has complexity that grows nearly linearly in , which can be quadratic or even cubic in in very non-expanding graphs.
thanks a lot for your very detailed answer!