CS294 Lecture 6: Higher-order Cheeger Inequality

In which we state an analog of Cheeger’s inequalities for the {k}-th smallest Laplacian eigenvalue, and we discuss the connection between this result and the analysis of spectral partitioning algorithms

1. Cheeger-type Inequalities for {\lambda_k}

Let {G=(V,E)} be an undirected {d}-regular graph, {A} its adjacency matrix, {L = I - \frac 1d A} its normalized Laplacian matrix, and {0 = \lambda_1 \leq \cdots \leq \lambda_n \leq 2} be the eigenvalues of {L}, counted with multiplicities and listed in non-decreasing order.

In Handout 2, we proved that {\lambda_k = 0} if and only if {G} has at least {k} connected components, that is, if and only if there are {k} disjoint sets {S_1,\ldots,S_k} such that {\phi(S_i) = 0} for {i=1,\ldots,k}. In this lecture and the next one we will prove a robust version of this fact.

First we introduce the notion of higher-order expansion. If {S_1,\ldots,S_k} is a collection of disjoint sets, then their order-{k} expansion is defined as

\displaystyle  \phi_k (S_1,\ldots,S_k) = \max_{i=1,\ldots,k} \ \ \phi(S_i)

and the order-{k} expansion of a graph {G} is

\displaystyle \phi_k(G )= \min_ {S_1,\ldots,S_k \ \rm disjoint } \ \ \phi(S_1,\ldots,S_k)

If the edges of a graph represent a relation of similarity of affinity, a low-expansion collection of sets {S_1,\ldots,S_k} represents an interesting notion of clustering, because the vertices in each set {S_i} are more related to each other than to the rest of the graph. (Additional properties are desirable in a good clustering, and we will discuss this later.)

We will prove the following higher-order Cheeger inequalities:

\displaystyle  \frac { \lambda_k } 2 \leq \phi_k(G) \leq O(k^{3.5} ) \sqrt {\lambda_k}

Stronger upper bounds are known, but the bound above is easier to prove from scratch. It is known that {\phi_k (G) \leq O(k^2) \sqrt{\lambda_k}} and that {\phi_k(G) \leq O_\epsilon ( \sqrt {\log k} ) \cdot \sqrt {\lambda_{(1+\epsilon) \cdot k} } }.

Continue reading

CS359G Lecture 4: Spectral Partitioning

In which we prove the difficult direction of Cheeger’s inequality.

As in the past lectures, consider an undirected {d}-regular graph {G=(V,E)}, call {A} its adjacency matrix, and {M: = \frac 1d A} its scaled adjacency matrix. Let {\lambda_1 \geq \cdots \geq \lambda_n} be the eigenvalues of {M}, with multiplicities, in non-increasing order. We have been studying the edge expansion of a graph, which is the minimum of {h(S)} over all non-trivial cuts {(S,V-S)} of the vertex set (a cut is trivial if {S=\emptyset} or {S=V}), where the expansion {h(S)} of a cut is

\displaystyle  h(S) := \frac{Edges(S,V-S)}{d\cdot \min \{ |S|, |V-S| \} }

We have also been studying the (uniform) sparsest cut problem, which is the problem of finding the non-trivial cut that minimizes {\phi(S)}, where the sparsisty {\phi(S)} of a cut is

\displaystyle  \phi(S) := \frac{Edges(S,V-S)}{\frac dn |S| \cdot |V-S|}

We are proving Cheeger’s inequalities:

\displaystyle  \frac {1-\lambda_2}{2} \leq h(G) \leq \sqrt{2 \cdot (1-\lambda_2) } \ \ \ \ \ (1)

and we established the left-hand side inequality in the previous lecture, showing that the quantity {1-\lambda_2} can be seen as the optimum of a continuous relaxation of {\phi(G)}, so that {1-\lambda_2 \leq \phi(g)}, and {\phi(G) \leq 2 h(G)} follows by the definition.

Today we prove the more difficult, and interesting, direction. The proof will be constructive and algorithmic. The proof can be seen as an analysis of the following algorithm.

Algorithm: SpectralPartitioning

  • Input: graph {G=(V,E)} and vector {{\bf x} \in {\mathbb R}^V}
  • Sort the vertices of {V} in non-decreasing order of values of entries in {{\bf x}}, that is let {V=\{ v_1,\ldots,v_n\}} where {x_{v_1} \leq x_{v_2} \leq \ldots x_{v_n}}
  • Let {i\in \{1,\ldots,n-1\}} be such that {h(\{ v_1,\ldots, v_i \} )} is minimal
  • Output {S= \{ v_1,\ldots, v_i \}}

We note that the algorithm can be implemented to run in time {O(|V|+|E|)}, assuming arithmetic operations and comparisons take constant time, because once we have computed {h(\{ v_1,\ldots,v_{i} \})} it only takes time {O(degree(v_{i+1}))} to compute {h(\{ v_1,\ldots,v_{i+1} \})}.

We have the following analysis of the quality of the solution:

Lemma 1 (Analysis of Spectral Partitioning) Let {G=(V,E)} be a d-regular graph, {{\bf x}\in {\mathbb R}^V} be a vector such that {{\bf x} \perp {\bf 1}}, let {M} be the normalized adjacency matrix of {G}, define

\displaystyle  \delta:= \frac{ \sum_{i,j} M_{i,j} |x_i - x_j |^2}{\frac 1n \sum_{i,j} |x_i - x_j |^2}

and let {S} be the output of algorithm SpectralPartitioning on input {G} and {x}. Then

\displaystyle  h(S) \leq \sqrt{2\delta}

Remark 1 If we apply the lemma to the case in which {{\bf x}} is an eigenvector of {\lambda_2}, then {\delta = 1-\lambda_2}, and so we have

\displaystyle  h(G) \leq h(S) \leq \sqrt{2 \cdot (1-\lambda_2)}

which is the difficult direction of Cheeger’s inequalities.

Remark 2 If we run the SpectralPartitioning algorithm with the eigenvector {{\bf x}} of the second eigenvalue {\lambda_2}, we find a set {S} whose expansion is

\displaystyle  h(S) \leq \sqrt{2\cdot (1-\lambda_2) } \leq 2 \sqrt{h(G)}

Even though this doesn’t give a constant-factor approximation to the edge expansion, it gives a very efficient, and non-trivial, approximation.

As we will see in a later lecture, there is a nearly linear time algorithm that finds a vector {x} for which the expression {\delta} in the lemma is very close to {1-\lambda_2}, so, overall, for any graph {G} we can find a cut of expansion {O(\sqrt {h(G)})} in nearly linear time.

Continue reading

Max Cut-Gain and the Smallest Eigenvalue

In June, I wrote about my work on using spectral partitioning to approximate Max Cut.

I have now posted a revised paper with a couple new things.

One is an improved analysis due to Moses Charikar, of the algorithm described in the June paper. Moses shows that if one starts from a graph in which the optimum cuts a 1-\epsilon fraction of edges, then the algorithm cuts at least a 1-4\sqrt \epsilon + 8\epsilon fraction of edges (and also at least half of the edges). Cutting more than a 1-\frac 2 \pi \sqrt \epsilon + o_\epsilon(1) fraction of edges is Unique-Games-hard. Optimizing the fraction

\displaystyle \frac{ \max \{ \ \frac 12  \ , \ (1-4\sqrt \epsilon + 8\epsilon) \ \} }{1-\epsilon}

we see that the approximation ratio of the algorithm is always at least .531.

The second new result answers a question raised by an anonymous commenter in June: what about Max Cut Gain? Continue reading

Max Cut and the Smallest Eigenvalue

In the Max Cut problem, we are given an undirected graph G=(V,E) and we want to find a partition (L,\bar L) of the set of vertices such that as many edges as possible have one endpoint in L and one endpoint in \bar L, and are hence cut by the partition.

It is easy, as recognized since the 1970s, to find a partition that cuts half of the edges and that, thus, is at least half as good as an optimal solution. No approximation better than 1/2 was known for this problem, until Goemans and Williamson famously proved that one could achieve a .878… approximation using semidefinite programming (SDP).

No other approximation algorithm achieving an approximation asymptotically better than 1/2 is known, and it seems that a fundamental difficulty is the following. Suppose we prove that a certain algorithm achieves approximation > 51\%. Then, given a graph in which the optimum is, say, < 50.4 \%, the algorithm and its analysis must provide a certificate that the optimum cut in the given graph is < 50.4/51 < 99\%, and there is no general technique to prove upper bounds to the Max Cut optimum of a general graph other than Semidefinite Programming. (And see here and here for negative results showing that large classes of linear programming relaxations are unable to give such certificates.)

Spectral techniques can prove upper bounds to Max Cut in certain cases (and can be seen as special cases of the upper bounds provided by the Goemans-Williamson relaxation).

In the simplified case in which G=(V,E) is a d-regular graph, let A be the adjacency matrix of G and \lambda_1 \geq \lambda_2 \geq \cdots \lambda_n be the eigenvalues of A; then it is easy to show that

(1) \ \ \ \displaystyle maxcut(G) \leq \frac 12  + \frac 12 \cdot \frac {|\lambda_n|}{d}

where maxcut(G) is the fraction of edges cut by an optimal solution. Unfortunately (1) does not quite have an approximate converse: there are graphs where |\lambda_n|=d but maxcut(G) = \frac 12 + o_d(1).

The following fact, however, is always true and well known:

  • |\lambda_n|=d if and only if G contains a bipartite connected component.

Is there an “approximate” version of the above statement characterizing the cases in which d-|\lambda_n| is small? Surprisingly, as far as I know the question had not been considered before.

For comparison, the starting point of the theory of edge expansion is the related fact

  • \lambda_2=d if and only if G is disconnected.

Which can be rephrased as:

  • \lambda_2=d if and only if there is a non-empty S\subseteq V, |S| \leq |V|/2 such that edges(S,V-S)=0.

Cheeger’s inequality characterizes the case in which d-\lambda_2 is small:

  • If there is a non-empty S\subseteq V, |S| \leq |V|/2 such that edges(S,V-S) \leq \epsilon \cdot d \cdot |S|, then \lambda_2 \geq d\cdot (1-2\epsilon);
  • If \lambda_2 \geq d\cdot (1-\epsilon) then there is a non-empty S\subseteq V, |S| \leq |V|/2 such that edges(S,V-S) \leq \sqrt{2 \epsilon} \cdot d \cdot |S|.

For a subset S\subseteq V, and a bipartition L,R=S-L of S, we say that an edge (i,j) fails [to be cut by] the bipartition if (i,j) is incident on S but it is not the case that one endpoint is in L and one endpoint is in R. (This means that either both endpoints are in L, or both endpoints are in R, or one endpoint is in S and one endpoint is not in S.) Then we can express the well-known fact about \lambda_n as

  • |\lambda_n|=d if and only if there is S\subseteq V and a bipartition of S with zero failed edges.

In this new paper I prove the following approximate version

  • If there is a non-empty S\subseteq V, and a bipartition of S with at most \epsilon \cdot d \cdot |S| failed edges, then |\lambda_n| \geq d\cdot (1-4\epsilon);
  • If |\lambda_n| \geq d\cdot (1-\epsilon), then there is a non-empty S\subseteq V, and a partition of S with at most \sqrt{2 \epsilon} \cdot d \cdot |S| failed edges.

The following notation makes the similarity with Cheeger’s inequality clearer. Define the edge expansion of a graph G as

\displaystyle h(G) = \min_{S\subseteq V. \ |S| \leq \frac {|V|}{2} } \frac {edges(S,V-S)} {d|S|}

Let us define the bipartiteness ratio of G as

\displaystyle \beta(G) = \min_{S, \ L \subseteq S, \ R=S-L} \frac{edges(L) + edges(R) + edges(S,V-S)}{d|S|}

that is, as the minimum ratio between failed edges of a partition of a set S over d|S|.

Then Cheeger’s inequality gives

\displaystyle \frac 12 \cdot \frac{d-\lambda_2}{d}  \leq h(G) \leq \sqrt{2 \cdot \frac{d-\lambda_2}{d} }

and our results give

\displaystyle \frac 14 \cdot \frac{d-|\lambda_n|}{d} \leq \beta(G) \leq \sqrt{2 \cdot \frac{d-|\lambda_n|}{d}}

This translates into an efficient algorithm that, given a graph G such that maxcut(G) \geq 1-\epsilon, finds a set S and a bipartition of S such that at least a 1- 4\sqrt{\epsilon} fraction of the edges incident on S are cut by the bipartition. Removing the vertices in S and continuing recursively on the residual graph yields a .50769… approximation algorithm for Max Cut. (The algorithm stops making recursive calls, and uses a random partition, when the partition of S found by the algorithm has too many failed edges.)

The paper is entirely a by-product of the ongoing series of posts on edge expansion: the question of relations between spectral techniques to max cut was asked by a commenter and the probabilistic view of the proof of Cheeger’s inequality that I wrote up in this post was very helpful in understanding the gap between \lambda_n and -d.

The Limitations of the Spectral Partitioning Algorithm

We continue to talk about the problem of estimating the expansion of a graph G=(V,E), focusing on the closely related sparsest cut, defined as

\displaystyle \phi(G):= \min_{S} \frac {edges(S,V-S) } { \frac 1n \cdot |S| \cdot |V-S| }

The spectral paritioning algorithm first finds a vector x \in {\mathbb R}^n minimizing

(1) \ \ \  \displaystyle \frac{ \sum_{i,j} A(i,j) |x(i)-x(j)|^2 } { \frac 1n \cdot \sum_{i,j} |x(i)-x(j)|^2}

(where A is the adjacency matrix of G) and then finds the best cut (S,V-S) where S is of the form \{ i \in V: \ x(i) \geq t \} for a threshold t.

We proved that if the quantity in (1) is \epsilon and G is d-regular, then the algorithm will find a cut of sparsity at most \sqrt{8 \epsilon d}, and that if x is the eigenvector of the second eigenvalue, then it is an optimal solution to (1), and the cost of an optimal solution to (1) is a lower bound to \phi(G). This means that the algorithm finds a cut of sparsity at most \sqrt{8 \phi(G) d}.

Can the analysis be improved? Continue reading

The Spectral Partitioning Algorithm

[In which we prove the “difficult part” of Cheeger’s inequality by analyzing a randomized rounding algorithm for a continuous relaxation of sparsest cut.]

We return to this month’s question: if G=(V,E) is a d-regular graph, how well can we approximate its edge expansion h(G) defined as

h(G) := \min_{S\subseteq V} \frac{edges(S,V-S)} {\min \{ |S|, \ |V-S| \} }

and its sparsest cut \phi(G) defined as

\phi(G) := \min_{S\subseteq V} \frac{edges(S,V-S)} { \frac 1n \cdot |S| \cdot |V-S|} =  \min_{x\in \{0,1\}^n } \frac{ \sum_{i,j} A(i,j) \cdot |x(i)-x(j)|}{\frac 1n  \sum_{i,j}  |x(i)-x(j)|},

where A is the adjacency matrix of G.

We have looked at three continuous relaxations of \phi(G), the spectral gap, the Leighton-Rao linear program, and the Arora-Rao-Vazirani semidefinite program.

As we saw, the spectral gap of G, defined as the difference between largest and second largest eigenvalue of A, can be seen as the solution to a continuous optimization problem:

d-\lambda_2 = \min_{x\in {\mathbb R}^n} \frac {\sum_{i,j} A(i,j) \cdot |x(i)-x(j)|^2}{\frac 1n  \sum_{i,j}  |x(i)-x(j)|^2}.

It follows from the definitions that

d-\lambda_2 \leq \phi \leq 2h

which is the “easy direction” of Cheeger’s inequality, and the interesting thing is that d-\lambda_2 is never much smaller, and it obeys

(1) \ \ \ d-\lambda_2 \geq \frac {h^2}{2d} \geq \frac {\phi^2}{8d},

which is the difficult part of Cheeger’s inequality. When we normalize all quantities by the degree, the inequality reads as

\frac 1 8 \cdot \left( \frac \phi d \right)^2 \leq \frac 1 2 \cdot \left( \frac h d \right)^2 \leq \frac{d-\lambda_2}{d} \leq \frac \phi d  \leq 2 \frac h d .

I have taught (1) in three courses and used it in two papers, but I had never really understood it, where I consider a mathematical proof to be understood if one can see it as a series of inevitable steps. Many steps in the proofs of (1) I had read, however, looked like magic tricks with no explanation. Finally, however, I have found a way to describe the proof that makes sense to me. (I note that everything I will say in this post will be completely obvious to the experts, but I hope some non-expert will read it and find it helpful.)

We prove (1), as usual, by showing that given any x\in {\mathbb R}^n such that

(2) \ \ \ \frac {\sum_{i,j} A(i,j) \cdot |x(i)-x(j)|^2}{\frac 1n  \sum_{i,j}  |x(i)-x(j)|^2} = \epsilon

we can find a threshold t such that the cut (S,V-S) defined by S:= \{ i: x_i \geq t \} satisfies

(3)  \ \ \ \frac{edges(S,V-S)} { \min \{ |S|,\ |V-S| \} } \leq \sqrt{2 d \epsilon}

and

\frac{edges(S,V-S)} { \frac 1n \cdot |S| \cdot |V-S| } \leq \sqrt{8 d \epsilon}.

This not only gives us a proof of (1), but also an algorithm for finding sparse cuts when they exist: take a vector x which is an eigenvector of the second eigenvalue (or simply a vector for which the Rayleigh quotient in (2) is small), sort the vertices i according to the value of x(i), and find the best cut among the “threshold cuts.” This is the “spectral partitioning” algorithm.

This means that proving (1) amounts to studying an algorithm that “rounds” the solution of a continuous relaxation to a combinatorial solution, and there is a standard pattern to such arguments in computer science: we describe a randomized rounding algorithm, study its average performance, and then argue that there is a fixed choice that is at least as good as an average one. Here, in particular, we would like to find a distribution T over threshold, such that if we define S:= \{ i: x(i) \geq T\} as a random variable in terms of T we have

{\mathbb E} [ edges(S,V-S) ] \leq {\mathbb E}  [\min \{ |S|,\ |V-S| \} ] \cdot  \sqrt{2 d \epsilon}

and so, using linearity of expectation,

{\mathbb E} [ edges(S,V-S)  - \min \{ |S|,\ |V-S| \}  \cdot  \sqrt{2 d \epsilon}] \leq 0

from which we see that there must be a threshold in our sample space such that (3) holds. I shall present a proof that explicitly follows this pattern. It would be nice if we could choose T uniformly at random in the interval \left[\min_i x(i), \max_i x(i)\right], but I don’t think it would work. (Any reader can see a counterexample? I couldn’t, but we’ll come back to it.) Instead, the following works: assuming with no loss of generality that the median of \{ x(1),\ldots,x(n) \} is zero, T can be chosen so that |T|\cdot T is distributed uniformly in the interval \left[\min_i x(i), \max_i x(i)\right]. (This means that thresholds near the median are less likely to be picked than thresholds far from the median.) This choice seems to be a magic trick in itself, voiding the point I made above, but I hope it will become natural as we unfold our argument. Continue reading