The Power Method

This week, the topic of my online course on graph partitioning and expanders is the computation of approximate eigenvalues and eigenvectors with the power method.

If M is a positive semidefinite matrix (a symmetric matrix all whose eigenvalues are nonnegative), then the power method is simply to pick a random vector x\in \{ -1,+1 \}^n, and compute y:= M^k x. If k is of the order of \frac 1 \epsilon \log \frac n \epsilon, then one has a constant probability that

\frac {y^T M y}{y^T y} \geq (1-\epsilon) \max_{x} \frac {x^T M x}{x^T x} = (1-\epsilon) \lambda_1

where \lambda_1 is the largest eigenvalue of M. If we are interested in the Laplacian matrix L = I - \frac 1d A of a d-regular graph, where A is the adjacency matrix of the graph, this gives a way to compute an approximation of the largest eigenvalue, and a vector of approximately maximum Rayleigh quotient, which is useful to approximate Max Cut, but not to apply spectral partitioning algorithms. For those, we need a vector that approximates the eigenvector of the second smallest eigenvalue.

Equivalently, we want to approximate the second largest eigenvalue of the adjacency matrix A. The power method is easy to adjust to compute the second largest eigenvalue instead of the largest (if we know an eigenvector of the largest eigenvalue): after you pick the random vector, subtract the component of the vector that is parallel to the eigenvector of the largest eigenvalue. In the case of the adjacency matrix of a regular graph, subtract from every coordinate of the random vector the average of the coordinates.

The adjacency matrix is not positive semidefinite, but we can adjust it to be by adding a multiple of the identity matrix. For example we can work with \frac 12 I + \frac 1{2d} A. Then the power method reduces to the following procedure: pick randomly x \sim \{ -1,1\}, then subtract \sum_i x_i/n from every entry of x, then repeat the following process k = O\left( \frac 1 \epsilon \log \frac n \epsilon \right) times: for every entry i, assign x_i := \frac 12 x_i + \frac 1 {2d} \sum_{j: (i,j) \in E} x_j, that is, replace the value that the vector assigns to vertex i with a convex combination of the current value and the current value of the neighbors. (Note that one iteration can be executed in time O(|V|+|E|).

The problem is that if we started from a graph whose Laplacian matrix has a second smallest eigenvalue \lambda_2, the matrix \frac 12 I + \frac 1{2d} A has second largest eigenvalue 1- \frac {\lambda_2}2, and if the power method finds a vector of Rayleigh quotient at least (1-\epsilon) \cdot \left( 1- \frac {\lambda_2}2 \right) for \frac 12 I + \frac 1{2d} A, then that vector has Rayleigh quotient about \lambda_2 - 2\epsilon for L, and unless we choose \epsilon of the same order as \lambda_2 we get nothing. This means that the number of iterations has to be about 1/\lambda_2, which can be quite large.

The video below (taken from this week’s lecture) shows how slowly the power method progresses on a small cycle with 31 vertices. It goes faster on the hypercube, which has a much larger \lambda_2.

A better way to apply the power method to find small eigenvalues of the Laplacian is to apply the power method to the pseudoinverse L^+ of the Laplacian. If the Laplacian of a connected graph has eigenvalues 0 = \lambda_1 < \lambda_2 \leq \cdots \leq \lambda_n, then the pseudoinverse L^+ has eigenvalues 0, \frac 1 {\lambda_2}, \cdots, \frac 1 {\lambda_n} with the same eigenvectors, so approximately finding the largest eigenvalue of L^+ is the same problem as approximately finding the second smallest eigenvalue of L.

Although we do not have fast algorithms to compute L^+, what we need to run the power method is, for a given x, to find the y such that L y = x, that is, to solve the linear system Ly = x in y given L and x.

For this problem, Spielman and Teng gave an algorithm nearly linear in the number of nonzero of L, and new algorithms have been developed more recently (and with some promise of being practical) by Koutis, Miller and Peng and by Kelner, Orecchia, Sidford and Zhu.

Coincidentally, just this week, Nisheeth Vishnoi has completed his monograph Lx=b on algorithms to solve such linear systems and their applications. It’s going to be great summer reading for those long days at the beach.

4 thoughts on “The Power Method

  1. In the video of the power method method one says, in the intro, that for the hypercube the arrangement is such that the sum of the edge generators is zero.

    Why is this important? What is the wider principle?

    Turing did the same, albeit with subgroup of Enigma groups. Ive never figured the significance to date.

  2. Ah, we are used to _balanced_ binary functions (and their relevance to crypto). If the set of generators is {-26…-1,1,…26} instead of {-1,1}, then I see that the sum of the powers being zero is a more general “balancedness”. This is a algebraic “standard reference point”, much as the uniform distribution of a maximally mixed state/phase is a standard reference point

  3. Pingback: Multitype Branching Processes | Eventually Almost Everywhere

  4. Regarding the visualization of the power method applied to the cycle graph in the video, you might be interested in the following paper on why the cycle tends to an ellipse, in case you don’t know it:

    A Elmachtoub, C Van Loan, “From Random Polygon to Ellipse: An Eigenanalysis,” SIAM Review, 2010.

    Click to access EllipsePoly.pdf

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s