In which we analyze a nearly-linear time algorithm for finding an approximate eigenvector for the second eigenvalue of a graph adjacency matrix, to be used in the spectral partitioning algorithm.
In past lectures, we showed that, if is a
-regular graph, and
is its normalized adjacency matrix with eigenvalues
, given an eigenvector of
, the algorithm SpectralPartition finds, in nearly-linear time
, a cut
such that
.
More generally, if, instead of being given an eigenvector such that
, we are given a vector
such that
, then the algorithm finds a cut such that
. In this lecture we describe and analyze an algorithm that computes such a vector using
arithmetic operations.
A symmetric matrix is positive semi-definite (abbreviated PSD) if all its eigenvalues are nonnegative. We begin by describing an algorithm that approximates the largest eigenvalue of a given symmetric PSD matrix. This might not seem to help very much because the adjacency matrix of a graph is not PSD, and because we want to compute the second largest, not the largest, eigenvalue. We will see, however, that the algorithm is easily modified to approximate the second eigenvalue of a PSD matrix (if an eigenvector of the first eigenvalue is known), and that the adjacency matrix of a graph can easily be modified to be PSD.
1. The Power Method to Approximate the Largest Eigenvalue
The algorithm works as follows
Algorithm Power
- Input: PSD symmetric matrix
, positive integer
- Pick uniformly at random
- for
to
-
- return
That is, the algorithm simply picks uniformly at random a vector with
coordinates, and outputs
.
Note that the algorithm performs arithmetic operations, where
is the number of non-zero entries of the matrix
.
Theorem 1 For every PSD matrix
, positive integer
and parameter
, with probability
over the choice of
, the algorithm Power outputs a vector
such that
where
is the largest eigenvalue of
.
Note that, in particular, we can have and
.
Let be the eigenvalues of
, with multiplicities, and
be a system of orthonormal eigenvectors such that
. Theorem 1 is implied by the following two lemmas
Lemma 2 Let
be a vector such that
. Sample uniformly
. Then
Lemma 3 Let
be a vector such that
. Then, for every positive integer
and positive
, if we define
, we have
It remains to prove the two lemmas.
Proof: (Of Lemma 2) Let . The inner product
is the random variable
Let us compute the first, second, and fourth moment of .
Recall that the Paley-Zygmund inequality states that if is a non-negative random variable with finite variance, then, for every
, we have
which follows by noting that
that
and that
We apply the Paley-Zygmund inequality to the case and
, and we derive
Remark 1 The proof of Lemma 2 works even if
is selected according to a 4-wise independent distribution. This means that the algorithm can be derandomized in polynomial time.
Proof: (Of Lemma 3) Let us write as a linear combination of the eigenvectors
where the coefficients can be computed as . Note that, by assumption,
, and that, by orthonormality of the eigenvectors,
.
We have
and so
and
We need to prove a lower bound to the ratio of the above two quantities. We will compute a lower bound to the numerator and an upper bound to the denominator in terms of the same parameter.
Let be the number of eigenvalues larger than
. Then, recalling that the eigenvalues are sorted in non-increasing order, we have
We also see that
So we have
giving
Remark 2 Where did we use the assumption that
is positive semidefinite? What happens if we apply this algorithm to the adjacency matrix of a bipartite graph?
2. Approximating the Second Eigenvalue
If is a PSD matrix, and if we know a unit-length eigenvector
of the largest eigenvalue of
, we can approximately find the second eigenvalue with the following adaption of the algorithm from the previous section.
Algorithm Power2
- Input: PSD symmetric matrix
, positive integer
, vector
- Pick uniformly at random
-
- for
to
-
- return
If is an orthonormal basis of eigenvectors for the eigenvalues
of
, then, at the beginning, we pick a random vector
that, with probability at least , satisfies
. (Cf. Lemma 2.) Then we compute
, which is the projection of
on the subspace orthogonal to
, that is
Note that and that
.
The output is the vector
If we apply Lemma 3 to subspace orthogonal to , we see that when
we have that, for every
,
We have thus established the following analysis.
Theorem 4 For every PSD matrix
, positive integer
and parameter
, if
is a length-1 eigenvector of the largest eigenvalue of
, then with probability
over the choice of
, the algorithm Power2 outputs a vector
such that
where
is the second largest eigenvalue of
, counting multiplicities.
Finally, we come to the case in which is the normalized adjacency matrix of a regular graph.
We know that has eigenvalues
and that
is an eigenvector of
.
Consider now the matrix . Every eigenvector of
with eigenvalue
is clearly also an eigenvector of
with eigenvalue
, and vice versa, thus
has eigenvalues
and it is PSD.
This means that we can run algorithm Power2 on the matrix using the vector
and a parameter
. The algorithm finds, with probability
, a vector
such that
which is equivalent to
What is a good reference for this material? I expect that the complexity of something so basic as eigenvalue computation has been analyzed in the literature extensively.
The problem has an enormous body of literature in the numerical analysis community, but I am not very familiar with that literature.
The 1992 paper Estimating the largest eigenvalues by the power and Lanczos algorithms with a random start by Kuczyński and Woźniakowski shows that, to compute the largest eigenvalue of a PSD matrix with relative error
,
iterations of the power method are enough (this is better than the
bound that I prove in the post because I only look at one scale of
instead of averaging over all scales); they have an explicit bound on the leading constant, and they show matrices for which there is a
lower bound.
For the Lanczos algorithm, which is known to perform better in practice, they prove that
iterations are sufficient.
An application of the Spielman-Teng solver for diagonally-dominated systems of linear equations is that, for the normalized adjacency matrix of a graph, one can find a constant-factor approximation of
in time
, independent of the value of
, while the approach described in this post has complexity that grows nearly linearly in
, which can be quadratic or even cubic in
in very non-expanding graphs.
thanks a lot for your very detailed answer!