The Power Method

This week, the topic of my online course on graph partitioning and expanders is the computation of approximate eigenvalues and eigenvectors with the power method.

If M is a positive semidefinite matrix (a symmetric matrix all whose eigenvalues are nonnegative), then the power method is simply to pick a random vector x\in \{ -1,+1 \}^n, and compute y:= M^k x. If k is of the order of \frac 1 \epsilon \log \frac n \epsilon, then one has a constant probability that

\frac {y^T M y}{y^T y} \geq (1-\epsilon) \max_{x} \frac {x^T M x}{x^T x} = (1-\epsilon) \lambda_1

where \lambda_1 is the largest eigenvalue of M. If we are interested in the Laplacian matrix L = I - \frac 1d A of a d-regular graph, where A is the adjacency matrix of the graph, this gives a way to compute an approximation of the largest eigenvalue, and a vector of approximately maximum Rayleigh quotient, which is useful to approximate Max Cut, but not to apply spectral partitioning algorithms. For those, we need a vector that approximates the eigenvector of the second smallest eigenvalue.

Equivalently, we want to approximate the second largest eigenvalue of the adjacency matrix A. The power method is easy to adjust to compute the second largest eigenvalue instead of the largest (if we know an eigenvector of the largest eigenvalue): after you pick the random vector, subtract the component of the vector that is parallel to the eigenvector of the largest eigenvalue. In the case of the adjacency matrix of a regular graph, subtract from every coordinate of the random vector the average of the coordinates.

The adjacency matrix is not positive semidefinite, but we can adjust it to be by adding a multiple of the identity matrix. For example we can work with \frac 12 I + \frac 1{2d} A. Then the power method reduces to the following procedure: pick randomly x \sim \{ -1,1\}, then subtract \sum_i x_i/n from every entry of x, then repeat the following process k = O\left( \frac 1 \epsilon \log \frac n \epsilon \right) times: for every entry i, assign x_i := \frac 12 x_i + \frac 1 {2d} \sum_{j: (i,j) \in E} x_j, that is, replace the value that the vector assigns to vertex i with a convex combination of the current value and the current value of the neighbors. (Note that one iteration can be executed in time O(|V|+|E|).

The problem is that if we started from a graph whose Laplacian matrix has a second smallest eigenvalue \lambda_2, the matrix \frac 12 I + \frac 1{2d} A has second largest eigenvalue 1- \frac {\lambda_2}2, and if the power method finds a vector of Rayleigh quotient at least (1-\epsilon) \cdot \left( 1- \frac {\lambda_2}2 \right) for \frac 12 I + \frac 1{2d} A, then that vector has Rayleigh quotient about \lambda_2 - 2\epsilon for L, and unless we choose \epsilon of the same order as \lambda_2 we get nothing. This means that the number of iterations has to be about 1/\lambda_2, which can be quite large.

The video below (taken from this week’s lecture) shows how slowly the power method progresses on a small cycle with 31 vertices. It goes faster on the hypercube, which has a much larger \lambda_2.

A better way to apply the power method to find small eigenvalues of the Laplacian is to apply the power method to the pseudoinverse L^+ of the Laplacian. If the Laplacian of a connected graph has eigenvalues 0 = \lambda_1 < \lambda_2 \leq \cdots \leq \lambda_n, then the pseudoinverse L^+ has eigenvalues 0, \frac 1 {\lambda_2}, \cdots, \frac 1 {\lambda_n} with the same eigenvectors, so approximately finding the largest eigenvalue of L^+ is the same problem as approximately finding the second smallest eigenvalue of L.

Although we do not have fast algorithms to compute L^+, what we need to run the power method is, for a given x, to find the y such that L y = x, that is, to solve the linear system Ly = x in y given L and x.

For this problem, Spielman and Teng gave an algorithm nearly linear in the number of nonzero of L, and new algorithms have been developed more recently (and with some promise of being practical) by Koutis, Miller and Peng and by Kelner, Orecchia, Sidford and Zhu.

Coincidentally, just this week, Nisheeth Vishnoi has completed his monograph Lx=b on algorithms to solve such linear systems and their applications. It’s going to be great summer reading for those long days at the beach.

CS359G Lecture 7: Computing Eigenvectors

In which we analyze a nearly-linear time algorithm for finding an approximate eigenvector for the second eigenvalue of a graph adjacency matrix, to be used in the spectral partitioning algorithm.

In past lectures, we showed that, if {G=(V,E)} is a {d}-regular graph, and {M} is its normalized adjacency matrix with eigenvalues {1=\lambda_1 \geq \lambda_2 \ldots \geq \lambda_n}, given an eigenvector of {\lambda_2}, the algorithm SpectralPartition finds, in nearly-linear time {O(|E| + |V|\log |V|)}, a cut {(S,V-S)} such that {h(S) \leq 2\sqrt{h(G)}}.

More generally, if, instead of being given an eigenvector {{\bf x}} such that {M{\bf x} = \lambda_2 {\bf x}}, we are given a vector {{\bf x} \perp {\bf 1}} such that {{\bf x}^T M {\bf x} \geq (\lambda_2 - \epsilon) {\bf x}^T{\bf x}}, then the algorithm finds a cut such that {h(S) \leq \sqrt{4h(G) + 2\epsilon}}. In this lecture we describe and analyze an algorithm that computes such a vector using {O((|V|+|E|)\cdot \frac 1\epsilon \cdot \log \frac {|V|}{\epsilon})} arithmetic operations.

A symmetric matrix is positive semi-definite (abbreviated PSD) if all its eigenvalues are nonnegative. We begin by describing an algorithm that approximates the largest eigenvalue of a given symmetric PSD matrix. This might not seem to help very much because the adjacency matrix of a graph is not PSD, and because we want to compute the second largest, not the largest, eigenvalue. We will see, however, that the algorithm is easily modified to approximate the second eigenvalue of a PSD matrix (if an eigenvector of the first eigenvalue is known), and that the adjacency matrix of a graph can easily be modified to be PSD.

Continue reading

CS359G Lecture 6: The Spectrum of the Cycle and of the Hypercube

In which we talk about the spectrum of Cayley graphs of abelian groups, we compute the eigenvalues and eigenvectors of the cycle and of the hypercube, and we verify the tightness of the Cheeger inequalities and of the analysis of spectral partitioning

In this lecture we will prove the following results:

  1. The dimension-{d} hypercube {H_d} has {\lambda_2 = 1- \frac 2d} and {h(H_d) = \frac 1d}, giving an infinite family of graphs for which {\frac{1-\lambda_2}{2} = h(G)}, showing that the first Cheeger inequality is exactly tight.
  2. The {n}-cycle {C_n} has {\lambda_2 = 1 - O(n^{-2})}, and {h(C_n) \geq \frac 2n}, giving an infinite family of graphs for which {h(G) = \Omega(\sqrt{1-\lambda_2})}, showing that the second Cheeger inequality is tight up to a constant.
  3. There is an eigenvector of the second eigenvalue of the hypercube {H_d}, such that the SpectralPartitioning algorithm, given such a vector, outputs a cut {(S,V-S)} of expansion {h(S) = \Omega(1/\sqrt{d})}, showing that the analysis of the SpectralPartitioning algorithm is tight up to a constant.

Continue reading

CS359G Lecture 2: Linear Algebra Review

In which we review linear algebra and introduce spectral graph theory.

1. Eigenvalues and Eigenvectors

Spectral graph theory studies how the eigenvalues of the adjacency matrix of a graph, which are purely algebraic quantities, relate to combinatorial properties of the graph.

We begin with a brief review of linear algebra.

If {x=a+ib} is a complex number, then we let {x^* = a- ib} denote its conjugate.

If {M\in {\mathbb C}^{n \times n}} is a square matrix, {\lambda \in {\mathbb C}} is a scalar, {{\bf v} \in {\mathbb C}^n - \{ {\bf {0}} \}} is a non-zero vector and we have

\displaystyle   M {\bf v} = \lambda {\bf v} \ \ \ \ \ (1)

then we say that {\lambda} is an eigenvalue of {M} and that {{\bf v}} is eigenvector of {M} corresponding to the eigenvalue {\lambda}.

When (1) is satisfied, then we equivalently have

\displaystyle  (M-\lambda I) \cdot {\bf v} = {\bf {0}}

for a non-zero vector {{\bf v}}, which is equivalent to

\displaystyle   \det (M-\lambda I) = 0 \ \ \ \ \ (2)

For a fixed matrix {M}, the function {\lambda \rightarrow \det(M-\lambda I)} is a univariate polynomial of degree {n} in {\lambda} and so, over the complex numbers, the equation (2) has exactly {n} solutions, counting multiplicities.

If {G=(V,E)} is a graph, then we will be interested in the adjacency matrix {A} of {G}, that is the matrix such that {A_{ij} = 1} if {(i,j) \in E} and {A_{ij} =0} otherwise. If {G} is a multigraph or a weighted graph, then {A_{ij}} is equal to the number of edges between {(i,j)}, or the weight of the edge {(i,j)}, respectively.

The adjacency matrix of an undirected graph is symmetric, and this implies that its eigenvalues are all real.

Definition 1 A matrix {M\in {\mathbb C}^{n \times n}} is Hermitian if {M_{ij} = M_{ji}^*} for every {i,j}.

Note that a real symmetric matrix is always Hermitian.

Lemma 2 If {M} is Hermitian, then all the eigenvalues of {M} are real.

Proof: Let {M} be an Hermitian matrix and let {\lambda} be a scalar and {{\bf x}} be a non-zero vector such that {M {\bf x} = \lambda {\bf x}}. We will show that {\lambda = \lambda^*}, which implies that {\lambda} is a real number. We define the following inner product operation over vectors in {{\mathbb C}^n}:

\displaystyle  \langle {\bf v},{\bf w} \rangle := \sum_i v_i^*\cdot w_i

Notice that, by definition, we have {\langle {\bf v}, {\bf w} \rangle = (\langle {\bf w}, {\bf v} \rangle)^*} and {\langle {\bf v}, {\bf v} \rangle = || {\bf v} ||^2}. The lemma follows by observing that

\displaystyle  \langle M {\bf x} , {\bf x} \rangle

\displaystyle  = \sum_i \sum_j M^*_{ij} x_j^* x_i

\displaystyle  = \sum_i \sum_j M_{ji} x_i x_j^*

\displaystyle  = \langle {\bf x} , M {\bf x} \rangle

where we use the fact that {M} is Hermitian, and that

\displaystyle  \langle M{\bf x}, {\bf x} \rangle = \langle \lambda {\bf x}, {\bf x} \rangle = \lambda^* ||x||^2


\displaystyle  \langle {\bf x}, M {\bf x} \rangle = \langle {\bf x}, \lambda {\bf x} \rangle = \lambda ||x||^2

so that {\lambda = \lambda^*}. \Box

From the discussion so far, we have that if {A} is the adjacency matrix of an undirected graph then it has {n} real eigenvalues, counting multiplicities of the number of solutions to {\det (A - \lambda I ) = 0}.

If {G} is a {d}-regular graph, then instead of working with the adjacency matrix of {G} it is somewhat more convenient to work with the normalized matrix {M:= \frac 1d \cdot A}.

In the rest of this section we shall prove the following relations between the eigenvalues of {M} and certain purely combinatorial properties of {G}.

Theorem 3 Let {G} be a {d}-regular undirected graph, and {M = \frac 1d \cdot A} be its normalized adjacency matrix. Let {\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n} be the real eigenvalues of {M} with multiplicities. Then

  1. {\lambda_1 = 1} and {\lambda_n \geq -1}.
  2. {\lambda_2=1} if and only if {G} is disconnected.
  3. {\lambda_n=-1} if and only if at least one of the connected components of {G} is bipartite.

In the next lecture we will begin to explore an “approximate” version of the second claim, and to show that {\lambda_2} is close to 1 if and only if {G} has a sparse cut.

Continue reading