In the previous post, we saw that the sparsest cut of a graph $G=(V,E)$ (which is within a factor of 2 of the edge expansion of the graph), is defined as

$\phi := \min_{S\subseteq V} \frac{edges(S,V-S)}{\frac 1n |S|\cdot |V-S|}$

and can also be written as

$(1) \ \ \ \phi = \min_{x\in \{0,1\}^V} \frac{\sum_{i,j} A(i,j) | x(i)-x(j)|}{\frac 1n \sum_{i,j} |x(i)-x(j)|}$

while the eigenvalue gap of $G$, can be written as

$(2) \ \ \ d-\lambda_2 = \min_{x\in {\mathbb R}^V} \frac{\sum_{i,j} A(i,j) | x(i)-x(j)|^2}{\frac 1n \sum_{i,j} |x(i)-x(j)|^2}$

and so it can be seen as a relaxation of $\phi$, considering that
$|x(i)-x(j)|=|x(i)-x(j)|^2$ when $x(i),x(j)$ are boolean.

Let us now take a broader look at efficiently computable continuous relaxations of sparsest cut, and see that the eigenvalue gap is a semidefinite programming relaxation of sparsest cut, that by adding the triangle inequality to it we derive the Arora-Rao-Vazirani relaxation, and that by removing the semidefinite constraint from the ARV relaxation we get the Leighton-Rao relaxation.

We first prove that (2) is equivalent to

$(3) \ \ \ d-\lambda_2 = \min_{m, x_1,\ldots ,x_n \in {\mathbb R}^m} \frac{\sum_{i,j} A(i,j) || x_i - x_j ||^2}{\frac 1n \sum_{i,j} || x_i -x_j ||^2}$

The difference is that instead of optimizing over all possible ways to assign a real number $x(i)$ to each vertex $i$, we are now optimizing over all possible ways to assign a vector $x_i$ to each vertex $i$. The absolute value $|x(i)-x(j)|$ is then replaced by the Euclidean distance $|| x_i - x_j ||$. The right-hand-side of $(3)$ is clearly a relaxation of the right-hand-side of $(2)$, so we just need to establish that, in the right-hand-side of $(3)$, it is optimal to take 1-dimensional vectors. For every vector solution $x_1,\ldots,x_n$, we see that

$\frac{\sum_{i,j} A(i,j) || x_i - x_j ||^2}{\frac 1n \sum_{i,j} || x_i -x_j ||^2} = \frac{\sum_k \sum_{i,j} A(i,j) | x_i(k) - x_j(k) |^2}{\sum_k \sum_{i,j} | x_i(k) -x_j(k) |^2} \geq \min_k \frac{\sum_{i,j} A(i,j) | x_i(k) - x_j(k) |^2}{\sum_{i,j} | x_i(k) -x_j(k) |^2}$

where the last inequality follows from

$(4) \ \ \ \frac{\sum_{i=1}^n a_i}{\sum_{i=1}^n b_i} \geq \min_i \frac {a_i}{b_i}$

which, in turn, has a one-line proof:

$\sum_i a_i = \sum_i b_i \cdot \frac {a_i}{b_i} \geq \left( min_i \frac {a_i}{b_i} \right) \cdot \sum_i b_i.$

The formulation (3) is equivalent to the following semidefinite program:

$(5) \ \ \ \begin{array}{l} \min \sum_{i,j} A(i,j) ||x_i -x_j ||^2\\ subject\ to\\\sum_{i,j} ||x_i -x_j ||^2 = n\\x_i \in {\mathbb R}^{*} \end{array}$

An optimization problem is a semidefinite program if every variable is allowed to be an arbitrary vector, and both the objective function and the constraints are linear functions of inner products of pairs of variables. In our case,

$||x_i - x_j||^2 = \langle x_i , x_i \rangle + \langle x_j , x_j \rangle - 2 \langle x_i , x_j \rangle$

is a linear combination of inner products, and so the formulation (5) is a semidefinite program. Another way to define semidefinite programming (indeed, the definition that is usually given first, deriving the one in terms of inner products as an equivalent characterization) is that a semidefinite program is an optimization problem in which the variables are the entries of a positive semidefinite matrix $X$, and the objective function and the constraints are linear in the entries of $X$. A real matrix is positive semidefinite if it is symmetric and all its eigenvalues are non-negative. We can see that an $n\times n$ matrix $X$ is positive semidefinite if and only if there are vectors $x_1,\ldots,x_n$ such that $X(i,j) = \langle x_i,x_j \rangle$.

(The proof is simple: if $X$ is real and symmetric, then all its eigenvalues $\lambda_1,\ldots,\lambda_n$ are real, and there is an orthonormal system of eigenvectors $v_1,\ldots,v_k$ such that

$X=\sum_k \lambda_k v_k \cdot v_k^T$

so that

$X(i,j) = \sum_k \lambda_k v_i (k) v_j(k)$

If the eigenvalues are non-negative, we can let $x_i(k) := \sqrt{\lambda_k} v_i(k)$ and have $X(i,j) = \langle x_i ,x_j\rangle$.

On the other hand, if there are $x_i$ such that $X(i,j) = \langle x_i,x_j\rangle$, let $\lambda$ be any eigenvalue of $X$ and $v$ be a length-1 eigenvector of $\lambda$. Then:

$\lambda = v^T X v = \sum_{i,j} v_n(i)v_n(j) \sum_k x_i(k) x_j(k) = \sum_k \left( \sum_i v_n(i) x_i(k) \right)^2 \geq 0$

which completes the characterization.)

So we have gotten to the point where we can see $d-\lambda_2$ as a semidefinite programming relaxation of sparsest cut. An interesting property of semidefinite programs is that, like linear programs, they admit a dual. From a minimization (primal) semidefinite program, we can write a maximization (dual) semidefinite program such that the cost of any feasible solution for the dual is a lower bound to the cost of any feasible solution for the primal; furthermore, the two programs have the same optimum cost. Returning to our motivating question of certifying the expansion of a given graph, any feasible solution to the dual of (5) gives us a lower bound to the expansion of the graph.

We shall have to suffer a bit longer before we can write the dual of (5). It is best to first move to a formulation in terms of entries of semidefinite matrices. A few calculations show that (5) can be written as

$(6) \ \ \ \begin{array}{l} \min L_G \bullet X\\ subject\ to\\L_n \bullet X =n\\ X \succeq 0 \end{array}$

Where we write $X \succeq 0$ to mean that the matrix $X$ is positive semidefinite (we shall also write $X\succeq Y$ when $X-Y$ is positive semidefinite), we write $A \bullet B$ to mean $\sum_{i,j} A(i,j)B(i,j)$, $L_G$ is the Laplacian of $G$, that is the matrix $dI - A$, and $L_n$ is the Laplacian of the clique on $n$ vertices.

From this, we look up in Wikipedia the rules to form a semidefinite dual, and we get

$(7) \ \ \ \begin{array}{l} \max y\\ subject\ to\\L_G \succeq y\frac 1n L_n \end{array}$

It is possible to make sense of (7) independently of all the above discussion, that is, to see from first principles that the cost of any feasible solution to (7) is a lower bound to the sparsest cut of $G$.

A feasible solution is a value $y$ and a positive semidefinite matrix $Y$ such that $L_G = y\frac 1n L_n + Y$. Equivalently, it is a value $y$ and vectors $y_i$ such that

$L_G(i,j) = y\frac 1n L_n (i,j) + \langle y_i,y_j\rangle$

Take any cut $(S,V-S)$ in $G$. Then

$|S|\cdot |V-S| = 1_S^T L_n 1_S$

and

$edges(S,V-S) = 1_S^T L_G 1_S$

but

$1_S^T L_G 1_S = 1_S^T (y \frac 1n L_n + Y) 1_S \geq 1_S^T y \frac 1n L_n 1_S$

and so the cost of the cut is at least $y$. The justify the last inequality above, we need to prove that if $Y$ is positive semidefinite than

$1_S^T Y 1_S \geq 0$

Indeed

$1_S^T Y 1_S = \sum_{i,j\in S} \langle y_i,y_j \rangle = \langle \sum_{i\in S} y_i,\sum_{i\in S} y_i \rangle = || \sum_{i\in S} y_i ||^2$

We can also see, from first principles, that if $\lambda_1=d,\lambda_2,\ldots,\lambda_n$ are the eigenvalues of $A$ and $v_1,\ldots,v_n$ is a corresponding system of eigenvectors, we can find a feasible solution to (7) of cost $d-\lambda_2$.

We defined $L_G = dI - A$, where we assume that $G$ is $d$-regular. We can write

$dI - A$
$= dI - \sum_i \lambda_i v_i v_i^T$
$= (d-\lambda_2) \cdot\left( I - v_1v_1^T \right) +\lambda_2 I - \left( \lambda_2 v_1v_1^T+ \sum_{i=2}^n \lambda_i v_i v_i^T \right)$

And we are almost there because $\left( I - v_1v_1^T \right) = \frac 1n L_n$, and it remains to prove that the matrix

$Y:= \lambda_2 I - \left( \lambda_2 v_1v_1^T+ \sum_{i=2}^n \lambda_i v_i v_i^T \right)$

is positive semidefinite. This we can see by noting that $v_1,\ldots,v_n$ are eigenvectors of $Y$ with eigenvalues $0,0,\lambda_2-\lambda_3,\ldots,\lambda_2-\lambda_n$. Alternatively, we could construct vectors $y_i$ such that $Y(i,j) = \langle y_i,y_j\rangle$.

Essentially, what is going on is that we are “embedding” a scaled version of the complete graph into our graph. The edge expansion and sparsest cut of the complete graph are known, and the semidefinite constraint in (7) is telling us that the difference between our graph and the scaled complete graph can only add to the sparsest cut, so the sparsest cut of $G$ is at least the known sparsest cut of the scaled complete graph.

Why did we go through all this trouble? Now that we understand the spectral gap inside out, we can derive the Arora-Rao-Vazirani relaxation (ARV) of sparsest cut by just adding the “triangle inequalities” to (5), and we can derive the Leighton-Rao relaxation by removing the semidefinite constraints from ARV. In the dual of Leighton-Rao, we again “embed” a complete graph in our graph, by realizing every edge as a flow. In the dual of ARV, we will find both flows and spectral considerations.

[to be continued]