[In which we finally get to Leighton-Rao and ARV.]

Continuing our edge expansion marathon, we are going to see more ways in which the sparsest cut problem can be relaxed to polynomial-time solvable continuous problems. As before, our goal is to understand how this implies certificates of expansion.

We are talking about the sparsest cut \phi(G) of a graph G(V,E), which is

\phi(G) := \min_{S\subseteq V} \frac{edges(S,V-S)}{\frac 1n |S|\cdot V-S|} = \min_{x\in \{0,1\}^n} \frac { \sum_{i,j} A(i,j) |x(i)-x(j)|} { \frac 1n \sum_{i,j} |x(i)-x(j)|}

We have looked for a while at the eigenvalues gap of G. If G is d-regular and the second largest eigenvalue of G is \lambda_2, then the difference between largest and second largest eigenvalue obeys

d-\lambda_2 = \min_{x\in {\mathbb R}^n} \frac { \sum_{i,j} A(i,j) |x(i)-x(j)|^2} {\frac 1n \sum_{i,j} |x(i)-x(j)|^2} = \min_{m,x_1,\ldots,x_n\in {\mathbb R}^m} \frac {  \sum_{i,j} A(i,j) ||x_-x_j||^2} {\frac 1n \sum_{i,j} ||x_i-x_j||^2}

and, noting that |x(i)-x(j)|^2 = |x(i)-x(j)| when x\in \{0,1\}^n, we saw that d-\lambda_2 is a relaxation of \phi, and \phi \geq d-\lambda_2.

Leighton and Rao derive a continuous relaxation of \phi in quite a different way. They note that when x\in \{ 0,1\}^n, then the “distance”

d(i,j) := |x(i)-x(j)|

induces a semi-metric on V, that is, d(i,i)=0, d(i,j) = d(j,i), and the triangle inequality holds

d(i,j) \leq d(i,k) + d(k,j) \ \ \ \forall i,j,k

One can then consider the relaxation

(1) \ \ \ min_d \frac { \sum_{i,j} A(i,j) d(i,j)}  {\frac 1n \sum_{i,j} d(i,j)}

where the minimum is taken over all distance functions d: V\times V \rightarrow {\mathbb R} that define a semi-metric on V.

We can rewrite (1) as the linear program

(2) \ \ \ \begin{array}{l} \min \sum_{i,j} A(i,j) d(i,j)\\ subject\ to \\ \sum_{i,j} d(i,j)=n\\ d(i,j) \leq d(i,k) + d(k,j) \ \ \ \forall i,j,k\\ d(i,j) \geq 0 \ \ \ \forall i,j \end{array}

where we only write the triangle inequality constraints by having one variable d(i,j) for every unordered pair i,j.

The minimum of (2) gives a lower bound to the sparsest cut of G, and so every feasible solution for the dual of (2) gives a lower bound to \phi(G).

Before trying to write down the dual of (2), it is convenient to write (2) in a less efficient way. (This is akin to the fact that sometimes, when using induction, it is easier to prove a more general statement.) Instead of writing the triangle inequality for every three vertices, we write it for every arbitrary sequence of vertices.

(3) \ \ \ \begin{array}{l} \min \sum_{i,j} A(i,j) d(i,j)\\ subject\ to \\ \sum_{i,j} d(i,j)=n\\ d(i,j) \leq \sum_{(u,v) \in p} d(u,v) \ \ \ \forall i,j, \forall p \in P_{i,j} \\  d(i,j) \geq 0 \ \ \ \forall i,j\end{array}

where we denote by P_{i,j} the set of “paths” in the complete graph between i and j. (That is, P_{i,j} is the set of all sequences of vertices that start at i and end at j.)

Clearly (3) is the same as (2), in the sense that the extra inequalities present in (3) are implied by the triangle inequalities in (2). The dual of (3), however, is cleaner, and is given below

(4) \ \ \ \begin{array}{l} \max n\cdot y\\ \ y - \sum_{p\in P_{i,j}} y_p + \sum_{p: (i,j) \in p} y_p \leq A(i,j)\ \ \ \forall i,j\\ y,y_p \geq 0\end{array}

A feasible solution to (4) is thus a weighted set of paths and a parameter y. Before reasoning about the meaning of a solution, suppose that y,y_p satisfy the stronger constraints

(5) \ \ \ \begin{array}{l} \sum_{p\in P_{i,j}} y_p \geq y\\\sum_{p: (i,j) \in p} y_p \leq A(i,j) \end{array}

Then we can view the y_p : p \in P_{i,j} as defining a flow between i,j that carries at least y units of flow; furthermore, the union of all such flows uses at most a total capacity A(i,j) on each edge (i,j). We should think of it as an embedding of the complete graph scaled by y into G. I want to claim that this means that the sparsest cut of G is at least y times the sparsest cut of the complete graph, that is, at least y\cdot n.

To verify the claim, take any cut S,V-S. We know that such a cut separates |S| \cdot |V-S| pairs of vertices i,j, and that at least y units of flow are routed between each such pair, so that at least y\cdot |S| \cdot |V-S| units of flow cross the cut. Since every edge has capacity 1, we see that at least y\cdot  |S|\cdot  |V-S| edges of G must cross the cut.

It remains to bridge the difference between (4) or (5). One possibility is to show that a solution to (4) can be modified to a solution satisfying (5) with no loss in y. Another approach is to directly use (4) and a general weak duality argument to show directly that a solution satisfying (4) implies that G has at least the edge expansion of a complete graph with edge weights y.

There is, in fact, a broader principle at work, which we can verify by a similar argument: if \{ y_p\} is an assignment of weights to all possible paths (which we think of as defining a multicommodity flow), G is a graph with adjacency matrix A, H is a graph with adjacency matrix B, and we have

B(i,j) - \sum_{p\in P_{i,j}} y_p + \sum_{p: (i,j) \in p} y_p \leq A(i,j)\ \ \ \forall i,j

then we must have \phi(G) \geq \phi(H).

The fact that a feasible solution to (4) gives a lower bound to \phi(G) is a special case that applies to the case in which H is a scaled version of the complete graph.

This is a good point to pause, look back again at the bound coming from spectral considerations, and compare it with what we have here.

(A couple of notes. (i) From now on it will be convenient to identify a graph with its adjacency matrix. (ii) About notation: recall that if G is a d-regular graph then its Laplacian is L(G):= dI - G; more generally, if M is a matrix then its Laplacian L(M) is defined so that L(M)(i,j) := -M(i,j) when i \neq j, and L(M)(i,i) := \sum_j M(i,j). Observe that L is linear, that is, L(A+B) = L(A)+L(B) and L(c\cdot A) = c\cdot L(A).)

In the spectral bound, we find a y such that L(G) - y L(K_n) is positive semidefinite, and we use this fact to prove that \phi (G) \geq y\cdot n. This works because if G and H are two symmetric matrices such that L(G) \succeq L(H), then \phi(G) \geq \phi(H). In the analysis of the spectral gap, we apply it to the case in which H is a scaled version of the complete graph.

An important note: H is not a 0/1 matrix, and in fact it may have negative entries. The quantity \phi(H) is, however, always well defined as

\min_{S\subseteq V} \frac{\sum_{i\in S,j\not\in S} H(i,j)}{\frac 1n |S| \cdot |V-S|}

To wrap up our comparisons, in both cases we prove a lower bound on \phi(G) by showing that \phi(G) \geq \phi(H) where H is a scaled version of the complete graph, so that we know the exact value of \phi(H). In the spectral analysis, we argue that \phi(G) \geq \phi(H) by showing that

L(G) \succeq L(H);

in the Leighton-Rao approach we find a system of multicommodity flows \{ y_p \} such that

G(i,j) \geq H(i,j) + Y(i,j)\ \ \ \forall i,j

where, to clean up the notation, we define

Y(i,j) :=  \sum_{p: (i,j) \in p} y_p - \sum_{p\in P_{i,j}} y_p.

The two approaches are incomparable, and, naturally, we would like to combine them into a single approach that subsumes both.

Suppose that we find a graph H and flows \{ y_p\} such that

L(G) \succeq L(H+Y)

where, again, Y(i,j) :=  \sum_{p: (i,j) \in p} y_p - \sum_{p\in P_{i,j}} y_p.

Then, by spectral considerations,

\phi(G) \geq \phi (H+Y)

but by Leighton-Rao reasoning

\phi(H+Y) \geq \phi(H).

This approach subsumes the two because if A,B are two matrices such that A(i,j) \geq B(j,j) for every entry, then L(A) \succeq L(B). (This is “obvious” to people who know some linear algebra, but it is actually a great exercise to get the vector solution that proves that L(A)-L(B) = L(A-B) is positive semidefinite.)

Hence the following semidefinite program produces lower bound certificates for the sparsest cut of G. (As before, we shall take H to be a scaled version of the complete graph.)

(6) \ \ \ \begin{array}{l} \max n\cdot y \\ subject\ to\\ L(G) \succeq L(y\cdot K_n+Y)\\Y(i,j) =  \sum_{p: (i,j) \in p} y_p - \sum_{p\in P_{i,j}} y_p\\ y,y_p \geq 0\end{array}

To make sense of it, we need to understand Y and L(Y) a bit better. Let I_{i,j} be the matrix that has ones in the (i,j) and (j,i) entry, and zeroes everywhere else; hence I_{i,j} is the adjacency matrix of the graph consisting of the single edge (i,j). Then

Y = \sum_{i,j}   \sum_{p: (i,j) \in p} y_p I_{i,j} - \sum_{p\in P_{i,j}} y_p I_{i,j}
\ \  = \sum_{u,v} \sum_{p\in P_{u,v}} y_p \left( - I_{u,v} +  \sum_{(i,j) \in p} I_p \right)

that is,

Y = \sum_p y_p  M_p

where M_p is the matrix that has a one in each entry (i,j) corresponding to an edge in p, a -1 in the entry (u,v), where u and v are the end-points of the path p, and zeroes everywhere else. (Equivalently, for a path p, the matrix M_p is the adjacency matrix of the graph that has the path p plus an edge of weight -1 connecting the endpoints of the path.)

Our semidefinite program (6) now takes the form

(7) \ \ \ \begin{array}{l} \max n\cdot y \\ subject\ to\\ y\cdot L(K_n) + \sum_p y_p L(M_p) \preceq L(G)\\ y,y_p \geq 0\end{array}

We again look up in wikipedia how to form the dual of (7), and we get

(8)\ \ \  \begin{array}{l} \min L(G) \bullet X\\subject\ to\\ L(K_n) \bullet X \geq n\\ L(M_p) \bullet X \geq 0 \ \ \ \forall p\\ X \succeq 0 \end{array}

Which is a more constrained version of the semidefinite program that gives the spectral bound. Let us unfold what (8) means, and see that it is indeed a natural semidefinite programming relaxation of \phi, and it subsumes the spectral and the Leighton-Rao relaxations. (Although we already know that this is going to be the case given duality considerations.)

If we let x_1,\ldots,x_n be a system of vectors such that X(i,j) = \langle x_i,x_j \rangle, then from our past work on the spectral relaxation, we know how to make sense of the objective function and of the first constraint.

L(G)\bullet X = \frac 12 \sum_{i,j} G(i,j) \cdot || x_i - x_j||^2

and

L(K_n)\bullet X = \frac 12 \sum_{i,j} || x_i - x_j||^2

We leave it to the reader to verify that if p is a path with endpoints u,v, then

L(M_p) \bullet X= \frac 12 \left(- || x_u - x_v||^2 + \sum_{(i,j) \in p} || x_i - x_j ||^2 \right)

and so the constraints L(M_p) \bullet X \geq 0 are checking that the “distance function” d(i,j) := || x_i - x_j ||^2 satisfies the triangle inequality.

We can rewrite (8) in the following form, which makes it clear that is a relaxation of \phi and a strengthening of both the spectral bound and the Leighton-Rao linear program.

(9) \ \ \ \begin{array}{l} \min \sum_{i,j} G(i,j) \cdot || x_i - x_j ||^2\\subject\ to\\ \sum_{i,j}  || x_i - x_j ||^2 = n \\ || x_i - x_j ||^2 \leq ||x_i - x_k||^2 + || x_k - x_j||^2 \end{array}

This is the Arora-Rao-Vazirani semidefinite programming relaxation of sparsest cut.

Notice that if we have a solution x \in \{0,1\}^n to the sparsest cut problem we can turn into a feasible solution to (9) by letting x_i := x(i) be a one-dimensional solution; if we have a feasible solution to (9), then it is feasible for the spectral relaxation (which does not require the triangle inequalities) and it is feasible for Leighton-Rao, by setting d(i,j) := || x_i - x_j ||^2.

And so we have defined our three relaxations of sparsest cut and their duals, and understood “why” dual solutions give certificates of expansion.

It remains to see how close the optima of the relaxations are to \phi, a question that is answered by algorithmic techniques to “round” a relaxed solution into a cut of comparable cost. If we wonder how good such roundings can be, we are led to the study of metric embeddings.

About these ads