In the previous post, we saw that the sparsest cut of a graph G=(V,E) (which is within a factor of 2 of the edge expansion of the graph), is defined as

\phi := \min_{S\subseteq V} \frac{edges(S,V-S)}{\frac 1n |S|\cdot |V-S|}

and can also be written as

(1) \ \ \ \phi = \min_{x\in \{0,1\}^V} \frac{\sum_{i,j} A(i,j) | x(i)-x(j)|}{\frac 1n \sum_{i,j} |x(i)-x(j)|}

while the eigenvalue gap of G, can be written as

(2) \ \ \ d-\lambda_2 = \min_{x\in {\mathbb R}^V} \frac{\sum_{i,j} A(i,j) | x(i)-x(j)|^2}{\frac 1n \sum_{i,j} |x(i)-x(j)|^2}

and so it can be seen as a relaxation of \phi, considering that
|x(i)-x(j)|=|x(i)-x(j)|^2 when x(i),x(j) are boolean.

Let us now take a broader look at efficiently computable continuous relaxations of sparsest cut, and see that the eigenvalue gap is a semidefinite programming relaxation of sparsest cut, that by adding the triangle inequality to it we derive the Arora-Rao-Vazirani relaxation, and that by removing the semidefinite constraint from the ARV relaxation we get the Leighton-Rao relaxation.

We first prove that (2) is equivalent to

(3) \ \ \ d-\lambda_2  = \min_{m, x_1,\ldots ,x_n \in {\mathbb R}^m} \frac{\sum_{i,j} A(i,j) || x_i - x_j ||^2}{\frac 1n \sum_{i,j} || x_i -x_j ||^2}

The difference is that instead of optimizing over all possible ways to assign a real number x(i) to each vertex i, we are now optimizing over all possible ways to assign a vector x_i to each vertex i. The absolute value |x(i)-x(j)| is then replaced by the Euclidean distance || x_i - x_j ||. The right-hand-side of (3) is clearly a relaxation of the right-hand-side of (2), so we just need to establish that, in the right-hand-side of (3), it is optimal to take 1-dimensional vectors. For every vector solution x_1,\ldots,x_n, we see that

\frac{\sum_{i,j} A(i,j) || x_i - x_j ||^2}{\frac 1n \sum_{i,j} || x_i -x_j ||^2} = \frac{\sum_k \sum_{i,j} A(i,j) | x_i(k) - x_j(k) |^2}{\sum_k \sum_{i,j} | x_i(k) -x_j(k) |^2} \geq \min_k  \frac{\sum_{i,j} A(i,j) | x_i(k) - x_j(k) |^2}{\sum_{i,j} | x_i(k) -x_j(k) |^2}

where the last inequality follows from

(4) \ \ \ \frac{\sum_{i=1}^n a_i}{\sum_{i=1}^n b_i} \geq \min_i \frac {a_i}{b_i}

which, in turn, has a one-line proof:

\sum_i a_i = \sum_i b_i \cdot \frac {a_i}{b_i} \geq \left( min_i \frac {a_i}{b_i} \right) \cdot \sum_i b_i.

The formulation (3) is equivalent to the following semidefinite program:

(5) \ \ \ \begin{array}{l} \min \sum_{i,j} A(i,j) ||x_i -x_j ||^2\\ subject\ to\\\sum_{i,j} ||x_i -x_j ||^2 = n\\x_i \in {\mathbb R}^{*} \end{array}

An optimization problem is a semidefinite program if every variable is allowed to be an arbitrary vector, and both the objective function and the constraints are linear functions of inner products of pairs of variables. In our case,

||x_i - x_j||^2 =  \langle x_i , x_i \rangle + \langle x_j , x_j \rangle  - 2 \langle x_i , x_j \rangle

is a linear combination of inner products, and so the formulation (5) is a semidefinite program. Another way to define semidefinite programming (indeed, the definition that is usually given first, deriving the one in terms of inner products as an equivalent characterization) is that a semidefinite program is an optimization problem in which the variables are the entries of a positive semidefinite matrix X, and the objective function and the constraints are linear in the entries of X. A real matrix is positive semidefinite if it is symmetric and all its eigenvalues are non-negative. We can see that an n\times n matrix X is positive semidefinite if and only if there are vectors x_1,\ldots,x_n such that X(i,j) = \langle x_i,x_j \rangle.

(The proof is simple: if X is real and symmetric, then all its eigenvalues \lambda_1,\ldots,\lambda_n are real, and there is an orthonormal system of eigenvectors v_1,\ldots,v_k such that

X=\sum_k \lambda_k v_k \cdot v_k^T

so that

X(i,j) = \sum_k \lambda_k v_i (k) v_j(k)

If the eigenvalues are non-negative, we can let x_i(k) := \sqrt{\lambda_k} v_i(k) and have X(i,j) = \langle x_i ,x_j\rangle.

On the other hand, if there are x_i such that X(i,j) = \langle x_i,x_j\rangle, let \lambda be any eigenvalue of X and v be a length-1 eigenvector of \lambda. Then:

\lambda = v^T X v = \sum_{i,j} v_n(i)v_n(j) \sum_k x_i(k) x_j(k) = \sum_k \left( \sum_i v_n(i) x_i(k) \right)^2 \geq 0

which completes the characterization.)

So we have gotten to the point where we can see d-\lambda_2 as a semidefinite programming relaxation of sparsest cut. An interesting property of semidefinite programs is that, like linear programs, they admit a dual. From a minimization (primal) semidefinite program, we can write a maximization (dual) semidefinite program such that the cost of any feasible solution for the dual is a lower bound to the cost of any feasible solution for the primal; furthermore, the two programs have the same optimum cost. Returning to our motivating question of certifying the expansion of a given graph, any feasible solution to the dual of (5) gives us a lower bound to the expansion of the graph.

We shall have to suffer a bit longer before we can write the dual of (5). It is best to first move to a formulation in terms of entries of semidefinite matrices. A few calculations show that (5) can be written as

(6) \ \ \ \begin{array}{l} \min  L_G \bullet X\\ subject\ to\\L_n \bullet X =n\\ X \succeq  0 \end{array}

Where we write X \succeq 0 to mean that the matrix X is positive semidefinite (we shall also write X\succeq Y when X-Y is positive semidefinite), we write A \bullet B to mean \sum_{i,j} A(i,j)B(i,j), L_G is the Laplacian of G, that is the matrix dI - A, and L_n is the Laplacian of the clique on n vertices.

From this, we look up in Wikipedia the rules to form a semidefinite dual, and we get

(7) \ \ \ \begin{array}{l} \max y\\ subject\ to\\L_G  \succeq  y\frac 1n  L_n \end{array}

It is possible to make sense of (7) independently of all the above discussion, that is, to see from first principles that the cost of any feasible solution to (7) is a lower bound to the sparsest cut of G.

A feasible solution is a value y and a positive semidefinite matrix Y such that L_G = y\frac 1n L_n + Y. Equivalently, it is a value y and vectors y_i such that

L_G(i,j) = y\frac 1n L_n (i,j) + \langle y_i,y_j\rangle

Take any cut (S,V-S) in G. Then

|S|\cdot |V-S| = 1_S^T L_n 1_S

and

edges(S,V-S) =  1_S^T L_G 1_S

but

1_S^T L_G 1_S = 1_S^T (y \frac 1n L_n + Y) 1_S \geq  1_S^T y \frac 1n L_n 1_S

and so the cost of the cut is at least y. The justify the last inequality above, we need to prove that if Y is positive semidefinite than

1_S^T Y 1_S \geq 0

Indeed

1_S^T Y 1_S = \sum_{i,j\in S} \langle y_i,y_j \rangle = \langle \sum_{i\in S} y_i,\sum_{i\in S} y_i \rangle = || \sum_{i\in S} y_i ||^2

We can also see, from first principles, that if \lambda_1=d,\lambda_2,\ldots,\lambda_n are the eigenvalues of A and v_1,\ldots,v_n is a corresponding system of eigenvectors, we can find a feasible solution to (7) of cost d-\lambda_2.

We defined L_G = dI - A, where we assume that G is d-regular. We can write

dI - A
= dI - \sum_i \lambda_i v_i v_i^T
= (d-\lambda_2) \cdot\left( I -  v_1v_1^T \right) +\lambda_2 I - \left( \lambda_2 v_1v_1^T+ \sum_{i=2}^n \lambda_i v_i v_i^T \right)

And we are almost there because \left( I -  v_1v_1^T \right) = \frac 1n L_n, and it remains to prove that the matrix

Y:= \lambda_2 I - \left( \lambda_2 v_1v_1^T+ \sum_{i=2}^n \lambda_i v_i v_i^T \right)

is positive semidefinite. This we can see by noting that v_1,\ldots,v_n are eigenvectors of Y with eigenvalues 0,0,\lambda_2-\lambda_3,\ldots,\lambda_2-\lambda_n. Alternatively, we could construct vectors y_i such that Y(i,j) = \langle y_i,y_j\rangle.

Essentially, what is going on is that we are “embedding” a scaled version of the complete graph into our graph. The edge expansion and sparsest cut of the complete graph are known, and the semidefinite constraint in (7) is telling us that the difference between our graph and the scaled complete graph can only add to the sparsest cut, so the sparsest cut of G is at least the known sparsest cut of the scaled complete graph.

Why did we go through all this trouble? Now that we understand the spectral gap inside out, we can derive the Arora-Rao-Vazirani relaxation (ARV) of sparsest cut by just adding the “triangle inequalities” to (5), and we can derive the Leighton-Rao relaxation by removing the semidefinite constraints from ARV. In the dual of Leighton-Rao, we again “embed” a complete graph in our graph, by realizing every edge as a flow. In the dual of ARV, we will find both flows and spectral considerations.

[to be continued]

About these ads