CS294 Lecture 8: Spectral Algorithms Wrap-up

In which we talk about even more generalizations of Cheeger’s inequalities, and we analyze the power method to find approximate eigenvectors, thus having a complete description of a polynomial-time approximation algorithm for sparsest cut

The Power Method

This week, the topic of my online course on graph partitioning and expanders is the computation of approximate eigenvalues and eigenvectors with the power method.

If $M$ is a positive semidefinite matrix (a symmetric matrix all whose eigenvalues are nonnegative), then the power method is simply to pick a random vector $x\in \{ -1,+1 \}^n$, and compute $y:= M^k x$. If $k$ is of the order of $\frac 1 \epsilon \log \frac n \epsilon$, then one has a constant probability that

$\frac {y^T M y}{y^T y} \geq (1-\epsilon) \max_{x} \frac {x^T M x}{x^T x} = (1-\epsilon) \lambda_1$

where $\lambda_1$ is the largest eigenvalue of $M$. If we are interested in the Laplacian matrix $L = I - \frac 1d A$ of a $d$-regular graph, where $A$ is the adjacency matrix of the graph, this gives a way to compute an approximation of the largest eigenvalue, and a vector of approximately maximum Rayleigh quotient, which is useful to approximate Max Cut, but not to apply spectral partitioning algorithms. For those, we need a vector that approximates the eigenvector of the second smallest eigenvalue.

Equivalently, we want to approximate the second largest eigenvalue of the adjacency matrix $A$. The power method is easy to adjust to compute the second largest eigenvalue instead of the largest (if we know an eigenvector of the largest eigenvalue): after you pick the random vector, subtract the component of the vector that is parallel to the eigenvector of the largest eigenvalue. In the case of the adjacency matrix of a regular graph, subtract from every coordinate of the random vector the average of the coordinates.

The adjacency matrix is not positive semidefinite, but we can adjust it to be by adding a multiple of the identity matrix. For example we can work with $\frac 12 I + \frac 1{2d} A$. Then the power method reduces to the following procedure: pick randomly $x \sim \{ -1,1\}$, then subtract $\sum_i x_i/n$ from every entry of $x$, then repeat the following process $k = O\left( \frac 1 \epsilon \log \frac n \epsilon \right)$ times: for every entry $i$, assign $x_i := \frac 12 x_i + \frac 1 {2d} \sum_{j: (i,j) \in E} x_j$, that is, replace the value that the vector assigns to vertex $i$ with a convex combination of the current value and the current value of the neighbors. (Note that one iteration can be executed in time $O(|V|+|E|)$.

The problem is that if we started from a graph whose Laplacian matrix has a second smallest eigenvalue $\lambda_2$, the matrix $\frac 12 I + \frac 1{2d} A$ has second largest eigenvalue $1- \frac {\lambda_2}2$, and if the power method finds a vector of Rayleigh quotient at least $(1-\epsilon) \cdot \left( 1- \frac {\lambda_2}2 \right)$ for $\frac 12 I + \frac 1{2d} A$, then that vector has Rayleigh quotient about $\lambda_2 - 2\epsilon$ for $L$, and unless we choose $\epsilon$ of the same order as $\lambda_2$ we get nothing. This means that the number of iterations has to be about $1/\lambda_2$, which can be quite large.

The video below (taken from this week’s lecture) shows how slowly the power method progresses on a small cycle with 31 vertices. It goes faster on the hypercube, which has a much larger $\lambda_2$.

A better way to apply the power method to find small eigenvalues of the Laplacian is to apply the power method to the pseudoinverse $L^+$ of the Laplacian. If the Laplacian of a connected graph has eigenvalues $0 = \lambda_1 < \lambda_2 \leq \cdots \leq \lambda_n$, then the pseudoinverse $L^+$ has eigenvalues $0, \frac 1 {\lambda_2}, \cdots, \frac 1 {\lambda_n}$ with the same eigenvectors, so approximately finding the largest eigenvalue of $L^+$ is the same problem as approximately finding the second smallest eigenvalue of $L$.

Although we do not have fast algorithms to compute $L^+$, what we need to run the power method is, for a given $x$, to find the $y$ such that $L y = x$, that is, to solve the linear system $Ly = x$ in $y$ given $L$ and $x$.

For this problem, Spielman and Teng gave an algorithm nearly linear in the number of nonzero of $L$, and new algorithms have been developed more recently (and with some promise of being practical) by Koutis, Miller and Peng and by Kelner, Orecchia, Sidford and Zhu.

Coincidentally, just this week, Nisheeth Vishnoi has completed his monograph Lx=b on algorithms to solve such linear systems and their applications. It’s going to be great summer reading for those long days at the beach.

CS359G Lecture 7: Computing Eigenvectors

In which we analyze a nearly-linear time algorithm for finding an approximate eigenvector for the second eigenvalue of a graph adjacency matrix, to be used in the spectral partitioning algorithm.

In past lectures, we showed that, if ${G=(V,E)}$ is a ${d}$-regular graph, and ${M}$ is its normalized adjacency matrix with eigenvalues ${1=\lambda_1 \geq \lambda_2 \ldots \geq \lambda_n}$, given an eigenvector of ${\lambda_2}$, the algorithm SpectralPartition finds, in nearly-linear time ${O(|E| + |V|\log |V|)}$, a cut ${(S,V-S)}$ such that ${h(S) \leq 2\sqrt{h(G)}}$.

More generally, if, instead of being given an eigenvector ${{\bf x}}$ such that ${M{\bf x} = \lambda_2 {\bf x}}$, we are given a vector ${{\bf x} \perp {\bf 1}}$ such that ${{\bf x}^T M {\bf x} \geq (\lambda_2 - \epsilon) {\bf x}^T{\bf x}}$, then the algorithm finds a cut such that ${h(S) \leq \sqrt{4h(G) + 2\epsilon}}$. In this lecture we describe and analyze an algorithm that computes such a vector using ${O((|V|+|E|)\cdot \frac 1\epsilon \cdot \log \frac {|V|}{\epsilon})}$ arithmetic operations.

A symmetric matrix is positive semi-definite (abbreviated PSD) if all its eigenvalues are nonnegative. We begin by describing an algorithm that approximates the largest eigenvalue of a given symmetric PSD matrix. This might not seem to help very much because the adjacency matrix of a graph is not PSD, and because we want to compute the second largest, not the largest, eigenvalue. We will see, however, that the algorithm is easily modified to approximate the second eigenvalue of a PSD matrix (if an eigenvector of the first eigenvalue is known), and that the adjacency matrix of a graph can easily be modified to be PSD.