# The Spectral Partitioning Algorithm

[In which we prove the “difficult part” of Cheeger’s inequality by analyzing a randomized rounding algorithm for a continuous relaxation of sparsest cut.]

We return to this month’s question: if $G=(V,E)$ is a $d$-regular graph, how well can we approximate its edge expansion $h(G)$ defined as

$h(G) := \min_{S\subseteq V} \frac{edges(S,V-S)} {\min \{ |S|, \ |V-S| \} }$

and its sparsest cut $\phi(G)$ defined as

$\phi(G) := \min_{S\subseteq V} \frac{edges(S,V-S)} { \frac 1n \cdot |S| \cdot |V-S|} = \min_{x\in \{0,1\}^n } \frac{ \sum_{i,j} A(i,j) \cdot |x(i)-x(j)|}{\frac 1n \sum_{i,j} |x(i)-x(j)|}$,

where $A$ is the adjacency matrix of $G$.

We have looked at three continuous relaxations of $\phi(G)$, the spectral gap, the Leighton-Rao linear program, and the Arora-Rao-Vazirani semidefinite program.

As we saw, the spectral gap of $G$, defined as the difference between largest and second largest eigenvalue of $A$, can be seen as the solution to a continuous optimization problem:

$d-\lambda_2 = \min_{x\in {\mathbb R}^n} \frac {\sum_{i,j} A(i,j) \cdot |x(i)-x(j)|^2}{\frac 1n \sum_{i,j} |x(i)-x(j)|^2}$.

It follows from the definitions that

$d-\lambda_2 \leq \phi \leq 2h$

which is the “easy direction” of Cheeger’s inequality, and the interesting thing is that $d-\lambda_2$ is never much smaller, and it obeys

$(1) \ \ \ d-\lambda_2 \geq \frac {h^2}{2d} \geq \frac {\phi^2}{8d}$,

which is the difficult part of Cheeger’s inequality. When we normalize all quantities by the degree, the inequality reads as

$\frac 1 8 \cdot \left( \frac \phi d \right)^2 \leq \frac 1 2 \cdot \left( \frac h d \right)^2 \leq \frac{d-\lambda_2}{d} \leq \frac \phi d \leq 2 \frac h d$ .

I have taught (1) in three courses and used it in two papers, but I had never really understood it, where I consider a mathematical proof to be understood if one can see it as a series of inevitable steps. Many steps in the proofs of (1) I had read, however, looked like magic tricks with no explanation. Finally, however, I have found a way to describe the proof that makes sense to me. (I note that everything I will say in this post will be completely obvious to the experts, but I hope some non-expert will read it and find it helpful.)

We prove (1), as usual, by showing that given any $x\in {\mathbb R}^n$ such that

$(2) \ \ \ \frac {\sum_{i,j} A(i,j) \cdot |x(i)-x(j)|^2}{\frac 1n \sum_{i,j} |x(i)-x(j)|^2} = \epsilon$

we can find a threshold $t$ such that the cut $(S,V-S)$ defined by $S:= \{ i: x_i \geq t \}$ satisfies

$(3) \ \ \ \frac{edges(S,V-S)} { \min \{ |S|,\ |V-S| \} } \leq \sqrt{2 d \epsilon}$

and

$\frac{edges(S,V-S)} { \frac 1n \cdot |S| \cdot |V-S| } \leq \sqrt{8 d \epsilon}$.

This not only gives us a proof of (1), but also an algorithm for finding sparse cuts when they exist: take a vector $x$ which is an eigenvector of the second eigenvalue (or simply a vector for which the Rayleigh quotient in (2) is small), sort the vertices $i$ according to the value of $x(i)$, and find the best cut among the “threshold cuts.” This is the “spectral partitioning” algorithm.

This means that proving (1) amounts to studying an algorithm that “rounds” the solution of a continuous relaxation to a combinatorial solution, and there is a standard pattern to such arguments in computer science: we describe a randomized rounding algorithm, study its average performance, and then argue that there is a fixed choice that is at least as good as an average one. Here, in particular, we would like to find a distribution $T$ over threshold, such that if we define $S:= \{ i: x(i) \geq T\}$ as a random variable in terms of $T$ we have

${\mathbb E} [ edges(S,V-S) ] \leq {\mathbb E} [\min \{ |S|,\ |V-S| \} ] \cdot \sqrt{2 d \epsilon}$

and so, using linearity of expectation,

${\mathbb E} [ edges(S,V-S) - \min \{ |S|,\ |V-S| \} \cdot \sqrt{2 d \epsilon}] \leq 0$

from which we see that there must be a threshold in our sample space such that (3) holds. I shall present a proof that explicitly follows this pattern. It would be nice if we could choose $T$ uniformly at random in the interval $\left[\min_i x(i), \max_i x(i)\right]$, but I don’t think it would work. (Any reader can see a counterexample? I couldn’t, but we’ll come back to it.) Instead, the following works: assuming with no loss of generality that the median of $\{ x(1),\ldots,x(n) \}$ is zero, $T$ can be chosen so that $|T|\cdot T$ is distributed uniformly in the interval $\left[\min_i x(i), \max_i x(i)\right]$. (This means that thresholds near the median are less likely to be picked than thresholds far from the median.) This choice seems to be a magic trick in itself, voiding the point I made above, but I hope it will become natural as we unfold our argument.

First, we shall see that there are continuous relaxations of $h$ and $\phi$ that exactly match the combinatorial optimum, and that this can be established via random rounding.

Our first claim is that

$(4) \ \ \ \phi(G) = \min_{x \in {\mathbb R}^n} \frac{ \sum_{ij} A(i,j) | x(i)-x(j) |}{\frac 1n \sum_{ij} | x(i)-x(j) |}$

To prove (4), take an optimal solution $x\in {\mathbb R}^n$, pick a random threshold $t$ in the interval $\left[\min_i x(i),\max_i x(i)\right]$, and define $S:= \{ i: x(i) \geq t\}$. Then it should be clear that the probability that an edge $(i,j)$ is cut by $(S,V-S)$ is proportional to $|x(i)-x(j)|$. Normalizing $x$ so that $\max_i x_i - \min_i x_i = 1$, which does not affect the distribution of $S$, nor the optimality of $x$, the probability is exactly equal to $|x(i)-x(j)|$, and we have

${\mathbb E} [edges (S,V-S)] = \frac 12 \sum_{ij} A(i,j) | x(i)-x(j) |$

and

${\mathbb E} [|S| \cdot |V-S| ] = \frac 12 \sum_{ij} | x(i)-x(j) |$

so that the support of our distribution includes a cut $(S,V-S)$ such that

$\frac{edges(S,V-S)} {\frac 1n |S| \cdot |V-S|} \leq \frac{ \sum_{ij} A(i,j) | x(i)-x(j) |}{\frac 1n \sum_{ij} | x(i)-x(j) |}$

Next, we want to argue that

$(5) \ \ \ h(G) = \min_{x \in {\mathbb R}^n,\ med(x)=0} \frac{ \sum_{ij} A(i,j) | x(i)-x(j) |}{2\sum_i |x(i)|}$

where $med(x)$ denotes the median of the values $\{ x(1),\ldots,x(n)\}$. It may take a minute to see that the right-hand side is a relaxation of $h(G)$. Suppose that $(S,V-S)$ is the optimal cut in the definition of $h$, and that $|S| \leq n/2$. Then define $x(i)=1$ if $i\in S$ and $x(i)=0$ otherwise. The median of $x$ is zero, $|S| = \sum_i |x(i)|$, and $\sum_{ij} A(i,j) | x(i)-x(j) |$ counts twice every edge crossing the cut. So the right-hand-side is a relaxation, and can be no larger than $h(G)$.

As before, normalize $x$ so that $\max_i x_i - \min_i x_i = 1$, pick a threshold $t$ uniformly at random in this interval, and define $S = \{ i: x(i) \geq t\}$. We already argued that

${\mathbb E} [ edges (S,V-S) ] = \frac 12 \sum_{ij} A(i,j) | x(i)-x(j)|$

and it remains to show that

${\mathbb E} [\min \{ |S| , \ |V-S| \} ] = \sum_i |x_i|$.

This is just a matter of writing things down. If we name the vertices so that $x(1)\leq x(2) \leq \cdots x(n)$, and we use that for a non-negative integral random variable $N$ we have ${\mathbb E} [N] = \sum_k {\mathbb P} [N\geq k]$, then we just have to observe that

${\mathbb P} [\min \{ |S| , \ |V-S| \} \geq k] = | x(n-k) - x(k)|= |x(n-k)| + |x(k)|$.

Now that we have (4) and (5), if someone gives us an $x \in {\mathbb R}^n$ such that

$(6) \ \ \ \frac { \sum_{i,j} A(i,j) \cdot | x(i) - x(j)|^2 }{ \frac 1n \sum_{ij} |x(i)-x(j)|^2} = \epsilon$

our first thought would be to plug $x$ into the right-hand side of (4) and (5) and hope for the best. Unfortunately, I suspect that such hope would be unwarranted. We can see, however, that

$\sum_{i,j} A(i,j) \cdot | x(i) - x(j)|$
$\leq \sqrt{\sum_{i,j} A(i,j) \cdot | x(i) - x(j)|^2 } \cdot \sqrt{\sum_{i,j} A(i,j)}$
$= \sqrt{nd \cdot \sum_{i,j} A(i,j) \cdot | x(i) - x(j)|^2}$

and

$\sum_{ij} |x(i) - x(j)| \geq \frac 1 {\max_{ij} |x(i)-x(j)|} \cdot \sum_{ij} |x(i) - x(j)|^2$

so, if (6) holds, we have

$\frac{\sum_{ij} A(i,j) |x(i) - x(j)|}{\frac 1n \sum_{ij} |x(i)-x(j)|} \leq \sqrt{\epsilon d} \cdot \frac{\max_{ij} |x(i)-x(j)|}{\sqrt{ \frac 1 {n^2} \sum_{ij} |x(i)-x(j)|}}$

and, at least in the special case in which $\sum_{ij} |x(i)-x(j)| = \Omega(n^2 \cdot \max_{ij}|x(i)-x(j)|)$, this gives us a cut of expansion $O(\sqrt{\epsilon d})$. I don’t think, however, that just using $x$ would work in the general case.

(Note that this special case in which the very simple rounding works is also the case in which the simple rounding gives, on average, a nearly balanced partition. Interestingly, both the Leighton-Rao and the Arora-Rao-Vazirani analyses reduce the general case to a case in which the respective rounding algorithms produce nearly balanced partitions.)

The next try would be to find a $y$ such that $|y(i)-y(j)|$ is an approximation to $|x(i)-x(j)|^2$, and $|x(i)|^2$ is an approximation to $|y(i)|$. My first guess was to choose $y(i) := |x(i)|\cdot x(i)$, and this works!

Let us see what the right-hand sides of (4) and (5) look like for such a solution $y$. First, we may assume without loss of generality that we are given an $x$ such that the median of $x$ is zero. (Otherwise we can subtract the median from every entry without changing (6).) This means that we have a solution $y$ that is feasible for both (4) and (5).

In order to study the numerator of (6), we need to find a way to relate $|x(i)-x(j)|$ to $|y(i)-y(j)|$. A couple of attempts give

$|y(i) - y(j)| \leq |x(i)-x(j)| \cdot ( |x(i)| + |x(j)|)$

and it’s clear that the next step has to be Cauchy-Schwarz

$\quad \sum_{ij} A(i,j) \cdot |y(i) - y(j)|$
$\leq \sqrt{ \sum_{ij} A(i,j) |x(i)-x(j)|^2} \cdot \sqrt{ \sum_{ij} A(i,j) ( |x(i)| + |x(j)|)^2}$

So we have related the numerator of (6) to the numerators of (4) and (5), up to the extra term that is easy to bound (recall the graph is $d$-regular) as

$\ \ \sum_{ij} A(i,j) ( |x(i)| + |x(j)|)^2$
$\leq \sum_{ij} A(i,j) ( 2\cdot |x(i)| + 2\cdot |x(j)|)$
$= 4 d \sum_i |x(i)|^2$
$= 4d \sum_i |y(i)|$

Now we use (6) to get

$\ \ \ \sum_{ij} A(i,j) |x(i)-x(j)|^2$
$\leq \epsilon \frac 1n \sum_{i,j} |x(i)-x(j)|^2$
$= 2 \epsilon \sum_i x_i^2 - 2 \epsilon \left( \sum_i x(i) \right)^2$
$\leq 2\epsilon \sum_i |y(i)|$

and, combining everything,

$\frac{\sum_{ij} A(i,j) |y(i)-y(j)|} {2\sum_i |y(i)|} \leq \sqrt{2d\epsilon}$.

Although it is unnecessary to complete the proof, we can also see that, because the median of $y$ is zero, we have

$\sum_{i,j} |y(i)-y(j)| \geq n \sum_i |y(i)|$

and so

$\frac{\sum_{ij} A(i,j) |y(i)-y(j)|} {\frac 1n \sum_{ij} |y(i)-y(j)|} \leq 2\cdot \sqrt{2d\epsilon}$.

There is another way to look at this “rounding algorithm:” we see the solution $x$ that we are given as defining a “distance function” $d(i,j):= |x(i)-x(j)|^2$ among the vertices, and we find a combinatorial solution by finding an $r$ such that the cut is defined by the “ball of radius $r$” around the vertex $i_{\min}$, where $x(i_{\min}) = \min_i x(i)$. (If $t$ is the threshold picked by the algorithm, then $r= |t-x(i_{\min})|$.) Indeed, we pick a radius at random, with a certain distribution, and we are able to relate the probability that an edge $i,j$ is cut with the value $d(i,j)$. This view is not very natural for the spectral partitioning algorithm, but it is in the spirit of what happens in the rounding of the Leighton-Rao and the Arora-Rao-Vazirani relaxations.

## 6 thoughts on “The Spectral Partitioning Algorithm”

1. hi luca:

in addition to the wonderful post(s), one thing that might be worth mentioning is a corresponding “cheating-cheeger’s inequality” in the balanced separator world. the advantage of this is that it is straightforward to prove and the quadratic dependence comes out, perhaps, in a much cleaner way. also, it might be the way to introduce cheeger’s inequality in a course.

i am sure most of the readers of your blog have seen this, but for the others, here is a sketch of how it goes: for the sake of this discussion, assume that the graph is d-regular.

let \lambda_2 denote the normalized (by d) second smallest eigenvalue of the laplacian.

then \lambda_2 can be seen as the optimum of the following optimization problem (which arises as a relaxation to the sparsest cut problem in the natural way):

assign a vector x_i in R^n to each vertex so as to minimize the following ratio:

E_{e=ij} [||x_i-x_j||^2] / E_{i<j}[||x_i-x_j||^2]

(the ratio of average l_2^2 distance across the edges to the average
l_2^2 distance across all pairs of vertices).

on the other hand, the corresponding relaxation to the balanced separator problem can be written as:

assign a “unit” vector x_i in R^n to each vertex so as to minimize the following ratio:

E_{e=ij} [||x_i-x_j||^2]

such that E_{i<j}[||x_i-x_j||^2] = \Omega(1).

the optimum of this program is the equivalent of \lambda_2 for the balanced separator problem.

now suppose that this optima is less than \epsilon. then it is easy to show that there is a balanced cut in the graph which cuts <= O(\sqrt{\epsilon}) fraction of edges: simply pick a random hyperplane cut passing through
the origin.

since E_{e=ij} [||x_i-x_j||^2] <= \epsilon, the probability that a typical edge is cut is O(\sqrt{\epsilon}). here we use the fact that for an edge e=ij, with x_i, x_j being unit vectors with inner product 1-\epsilon/2 are separated with probability at most O(\sqrt{\epsilon}).

the fact that the cut is balanced comes from the fact that on an average two vertices are a constant distance apart.

-nisheeth

2. This is nice stuff. I hate to bug you, but I find this very hard to read. Is there any way of setting the text color to black, do you know?