The Limitations of the Spectral Partitioning Algorithm

We continue to talk about the problem of estimating the expansion of a graph G=(V,E), focusing on the closely related sparsest cut, defined as

\displaystyle \phi(G):= \min_{S} \frac {edges(S,V-S) } { \frac 1n \cdot |S| \cdot |V-S| }

The spectral paritioning algorithm first finds a vector x \in {\mathbb R}^n minimizing

(1) \ \ \  \displaystyle \frac{ \sum_{i,j} A(i,j) |x(i)-x(j)|^2 } { \frac 1n \cdot \sum_{i,j} |x(i)-x(j)|^2}

(where A is the adjacency matrix of G) and then finds the best cut (S,V-S) where S is of the form \{ i \in V: \ x(i) \geq t \} for a threshold t.

We proved that if the quantity in (1) is \epsilon and G is d-regular, then the algorithm will find a cut of sparsity at most \sqrt{8 \epsilon d}, and that if x is the eigenvector of the second eigenvalue, then it is an optimal solution to (1), and the cost of an optimal solution to (1) is a lower bound to \phi(G). This means that the algorithm finds a cut of sparsity at most \sqrt{8 \phi(G) d}.

Can the analysis be improved?

Consider first the case in which G is a cycle of length 2n, and x is a vector such that x(1),\ldots,x(n) are equally spaced between 0 and 1 (that is, x(i) = i/n) and x(n+1),\ldots,x(2n) are equally spaced between 1 and 0 (that is, x(j) = (2n-j)/n). Then the ratio in (1) is \Theta (1/n^2), because the numerator is the sum of 4n terms, each being \frac 1 {n^2}, and the denominator is \frac 1n times a sum which is approximately n^2 times the average distance square of two random points in {[0,1]} (which is \frac 13).

On the other hand, any cut of a cycle with 2n edges has sparsity at least \frac 2n, because every cut is crossed by at least two edges, and the denominator in the definition of sparsest cut is at most n.

This means that given a solution x to (1) of cost \epsilon, we cannot hope for a cut of sparsity o(\sqrt{d\epsilon}) because, in the case of the cycle, such a cut simply does not exist.

In which way, exactly, is the cycle a “bad” example for the spectral partitioning algorithm? After all, the algorithm will actually find an optimal cut. The point is that if we want our analysis to relate the value of the solution found by the algorithm to the optimum of (1), as we have done, then the existence of inputs on which the optimum of (1) is very far from the true optimum puts a limit to how good our analysis can be. In particular, such analysis cannot show that an algorithm can always find cuts of sparsity o(\sqrt{d\phi}). To turn things around, if we had an analysis of any “spectral algorithm” which proved that the algorithm always finds cuts of sparsity o(\sqrt{d\phi}), then our analysis must contain a proof that a cycle has expansion larger than \frac 1 {o(n^2)}, and so it must be implicitly defining a lower bound to sparsest cut which is sometimes asymptotically better than the spectral lower bound. As such, it would not be fair to think of it as “merely” a spectral algorithm since, at least implicitly, the algorithm relates to a tighter relaxation of the problem than implied by the optimum of (1). This issue always comes up when thinking about the limitations of rounding algorithms for convex relaxations, and although it may seem fuzzy to refer to what “really” is a rounding algorithm for a relaxation, it is inevitably so. Consider the “rounding” algorithm that given a relaxed solution discards it and just outputs an optimal solution found by other means. It seems ridiculous to consider it a “rounding” algorithm, but in which way it isn’t?

Returning to the concrete issue of the performance of the spectral partitioning algorithm which partitions based on thresholds, we could still ask the question of whether it might actually always find cuts of sparsity o(\sqrt{d \phi}), and, if so, whether we may be able to prove it by some combination of spectral and combinatorial techniques.

Unfortunately the answer is again negative.

Consider a d-dimensional hypercube. We can see that the “dimension cuts” are the sparsest cuts, and that their sparsity is 2. We studied the eigenvalues of the hypercube a while ago, and the spectral gap is also 2. The largest eigenvalue is d, and then the second largest is d-2, with multiplicity d. A basis for the eigenspace of the eigenvalue d-2 is made of one vector x_i \in \{ -1,1\}^V for each dimension cut, defined by x_i (a_1,\ldots,a_d) := (-1)^{a_i}. This is great, because if we get one of these nice eigenvectors we immediately recover the corresponding, optimal, dimension cut by, say, setting the threshold to 0. The problem is that x := \sum_i x_i is also an eigenvector for the eigenvalue d-2. Unfortunately, if v\in V = \{0,1\}^d is a vertex of the hypercube, then x(v) depends only on the number of ones of v. (Namely, x(v) is equal to d minus twice the number of ones in v.) This means that the cut found by the spectral partitioning algorithm given x will partition the vertices according to their weight. The best such cut (and the one found by the algorithm) is the “majority cut” of the hypercube, which has sparsity \Theta (\sqrt {d}). This gives an example where the algorithm does indeed find a cut whose sparsity is as bad as \Omega (\sqrt{ d \phi}).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s