Relaxations of Edge Expansion and Their Duals

In the previous post, we saw that the sparsest cut of a graph $G=(V,E)$ (which is within a factor of 2 of the edge expansion of the graph), is defined as

$\phi := \min_{S\subseteq V} \frac{edges(S,V-S)}{\frac 1n |S|\cdot |V-S|}$

and can also be written as

$(1) \ \ \ \phi = \min_{x\in \{0,1\}^V} \frac{\sum_{i,j} A(i,j) | x(i)-x(j)|}{\frac 1n \sum_{i,j} |x(i)-x(j)|}$

while the eigenvalue gap of $G$, can be written as

$(2) \ \ \ d-\lambda_2 = \min_{x\in {\mathbb R}^V} \frac{\sum_{i,j} A(i,j) | x(i)-x(j)|^2}{\frac 1n \sum_{i,j} |x(i)-x(j)|^2}$

and so it can be seen as a relaxation of $\phi$, considering that
$|x(i)-x(j)|=|x(i)-x(j)|^2$ when $x(i),x(j)$ are boolean.

Let us now take a broader look at efficiently computable continuous relaxations of sparsest cut, and see that the eigenvalue gap is a semidefinite programming relaxation of sparsest cut, that by adding the triangle inequality to it we derive the Arora-Rao-Vazirani relaxation, and that by removing the semidefinite constraint from the ARV relaxation we get the Leighton-Rao relaxation.

We first prove that (2) is equivalent to

$(3) \ \ \ d-\lambda_2 = \min_{m, x_1,\ldots ,x_n \in {\mathbb R}^m} \frac{\sum_{i,j} A(i,j) || x_i - x_j ||^2}{\frac 1n \sum_{i,j} || x_i -x_j ||^2}$

The difference is that instead of optimizing over all possible ways to assign a real number $x(i)$ to each vertex $i$, we are now optimizing over all possible ways to assign a vector $x_i$ to each vertex $i$. The absolute value $|x(i)-x(j)|$ is then replaced by the Euclidean distance $|| x_i - x_j ||$. The right-hand-side of $(3)$ is clearly a relaxation of the right-hand-side of $(2)$, so we just need to establish that, in the right-hand-side of $(3)$, it is optimal to take 1-dimensional vectors. For every vector solution $x_1,\ldots,x_n$, we see that

$\frac{\sum_{i,j} A(i,j) || x_i - x_j ||^2}{\frac 1n \sum_{i,j} || x_i -x_j ||^2} = \frac{\sum_k \sum_{i,j} A(i,j) | x_i(k) - x_j(k) |^2}{\sum_k \sum_{i,j} | x_i(k) -x_j(k) |^2} \geq \min_k \frac{\sum_{i,j} A(i,j) | x_i(k) - x_j(k) |^2}{\sum_{i,j} | x_i(k) -x_j(k) |^2}$

where the last inequality follows from

$(4) \ \ \ \frac{\sum_{i=1}^n a_i}{\sum_{i=1}^n b_i} \geq \min_i \frac {a_i}{b_i}$

which, in turn, has a one-line proof:

$\sum_i a_i = \sum_i b_i \cdot \frac {a_i}{b_i} \geq \left( min_i \frac {a_i}{b_i} \right) \cdot \sum_i b_i.$

The formulation (3) is equivalent to the following semidefinite program:

$(5) \ \ \ \begin{array}{l} \min \sum_{i,j} A(i,j) ||x_i -x_j ||^2\\ subject\ to\\\sum_{i,j} ||x_i -x_j ||^2 = n\\x_i \in {\mathbb R}^{*} \end{array}$

An optimization problem is a semidefinite program if every variable is allowed to be an arbitrary vector, and both the objective function and the constraints are linear functions of inner products of pairs of variables. In our case,

$||x_i - x_j||^2 = \langle x_i , x_i \rangle + \langle x_j , x_j \rangle - 2 \langle x_i , x_j \rangle$

is a linear combination of inner products, and so the formulation (5) is a semidefinite program. Another way to define semidefinite programming (indeed, the definition that is usually given first, deriving the one in terms of inner products as an equivalent characterization) is that a semidefinite program is an optimization problem in which the variables are the entries of a positive semidefinite matrix $X$, and the objective function and the constraints are linear in the entries of $X$. A real matrix is positive semidefinite if it is symmetric and all its eigenvalues are non-negative. We can see that an $n\times n$ matrix $X$ is positive semidefinite if and only if there are vectors $x_1,\ldots,x_n$ such that $X(i,j) = \langle x_i,x_j \rangle$.

(The proof is simple: if $X$ is real and symmetric, then all its eigenvalues $\lambda_1,\ldots,\lambda_n$ are real, and there is an orthonormal system of eigenvectors $v_1,\ldots,v_k$ such that

$X=\sum_k \lambda_k v_k \cdot v_k^T$

so that

$X(i,j) = \sum_k \lambda_k v_i (k) v_j(k)$

If the eigenvalues are non-negative, we can let $x_i(k) := \sqrt{\lambda_k} v_i(k)$ and have $X(i,j) = \langle x_i ,x_j\rangle$.

On the other hand, if there are $x_i$ such that $X(i,j) = \langle x_i,x_j\rangle$, let $\lambda$ be any eigenvalue of $X$ and $v$ be a length-1 eigenvector of $\lambda$. Then:

$\lambda = v^T X v = \sum_{i,j} v_n(i)v_n(j) \sum_k x_i(k) x_j(k) = \sum_k \left( \sum_i v_n(i) x_i(k) \right)^2 \geq 0$

which completes the characterization.)

So we have gotten to the point where we can see $d-\lambda_2$ as a semidefinite programming relaxation of sparsest cut. An interesting property of semidefinite programs is that, like linear programs, they admit a dual. From a minimization (primal) semidefinite program, we can write a maximization (dual) semidefinite program such that the cost of any feasible solution for the dual is a lower bound to the cost of any feasible solution for the primal; furthermore, the two programs have the same optimum cost. Returning to our motivating question of certifying the expansion of a given graph, any feasible solution to the dual of (5) gives us a lower bound to the expansion of the graph.

We shall have to suffer a bit longer before we can write the dual of (5). It is best to first move to a formulation in terms of entries of semidefinite matrices. A few calculations show that (5) can be written as

$(6) \ \ \ \begin{array}{l} \min L_G \bullet X\\ subject\ to\\L_n \bullet X =n\\ X \succeq 0 \end{array}$

Where we write $X \succeq 0$ to mean that the matrix $X$ is positive semidefinite (we shall also write $X\succeq Y$ when $X-Y$ is positive semidefinite), we write $A \bullet B$ to mean $\sum_{i,j} A(i,j)B(i,j)$, $L_G$ is the Laplacian of $G$, that is the matrix $dI - A$, and $L_n$ is the Laplacian of the clique on $n$ vertices.

From this, we look up in Wikipedia the rules to form a semidefinite dual, and we get

$(7) \ \ \ \begin{array}{l} \max y\\ subject\ to\\L_G \succeq y\frac 1n L_n \end{array}$

It is possible to make sense of (7) independently of all the above discussion, that is, to see from first principles that the cost of any feasible solution to (7) is a lower bound to the sparsest cut of $G$.

A feasible solution is a value $y$ and a positive semidefinite matrix $Y$ such that $L_G = y\frac 1n L_n + Y$. Equivalently, it is a value $y$ and vectors $y_i$ such that

$L_G(i,j) = y\frac 1n L_n (i,j) + \langle y_i,y_j\rangle$

Take any cut $(S,V-S)$ in $G$. Then

$|S|\cdot |V-S| = 1_S^T L_n 1_S$

and

$edges(S,V-S) = 1_S^T L_G 1_S$

but

$1_S^T L_G 1_S = 1_S^T (y \frac 1n L_n + Y) 1_S \geq 1_S^T y \frac 1n L_n 1_S$

and so the cost of the cut is at least $y$. The justify the last inequality above, we need to prove that if $Y$ is positive semidefinite than

$1_S^T Y 1_S \geq 0$

Indeed

$1_S^T Y 1_S = \sum_{i,j\in S} \langle y_i,y_j \rangle = \langle \sum_{i\in S} y_i,\sum_{i\in S} y_i \rangle = || \sum_{i\in S} y_i ||^2$

We can also see, from first principles, that if $\lambda_1=d,\lambda_2,\ldots,\lambda_n$ are the eigenvalues of $A$ and $v_1,\ldots,v_n$ is a corresponding system of eigenvectors, we can find a feasible solution to (7) of cost $d-\lambda_2$.

We defined $L_G = dI - A$, where we assume that $G$ is $d$-regular. We can write

$dI - A$
$= dI - \sum_i \lambda_i v_i v_i^T$
$= (d-\lambda_2) \cdot\left( I - v_1v_1^T \right) +\lambda_2 I - \left( \lambda_2 v_1v_1^T+ \sum_{i=2}^n \lambda_i v_i v_i^T \right)$

And we are almost there because $\left( I - v_1v_1^T \right) = \frac 1n L_n$, and it remains to prove that the matrix

$Y:= \lambda_2 I - \left( \lambda_2 v_1v_1^T+ \sum_{i=2}^n \lambda_i v_i v_i^T \right)$

is positive semidefinite. This we can see by noting that $v_1,\ldots,v_n$ are eigenvectors of $Y$ with eigenvalues $0,0,\lambda_2-\lambda_3,\ldots,\lambda_2-\lambda_n$. Alternatively, we could construct vectors $y_i$ such that $Y(i,j) = \langle y_i,y_j\rangle$.

Essentially, what is going on is that we are “embedding” a scaled version of the complete graph into our graph. The edge expansion and sparsest cut of the complete graph are known, and the semidefinite constraint in (7) is telling us that the difference between our graph and the scaled complete graph can only add to the sparsest cut, so the sparsest cut of $G$ is at least the known sparsest cut of the scaled complete graph.

Why did we go through all this trouble? Now that we understand the spectral gap inside out, we can derive the Arora-Rao-Vazirani relaxation (ARV) of sparsest cut by just adding the “triangle inequalities” to (5), and we can derive the Leighton-Rao relaxation by removing the semidefinite constraints from ARV. In the dual of Leighton-Rao, we again “embed” a complete graph in our graph, by realizing every edge as a flow. In the dual of ARV, we will find both flows and spectral considerations.

[to be continued]

6 thoughts on “Relaxations of Edge Expansion and Their Duals”

1. Luca, nice post. I anxiously await your interpretation of the dual of the ARV relaxation, which I still find a bit confounding. In particular, it involves the following kind of question: Consider a graph with positive and negative edge weights, and let L(G) be the Laplacian of G. When is L(G) a PSD matrix? (Certainly if there are no negative edge weights.)

To see how this relates to your post, consider e.g. the complete graph with edge weights all -eps for some small eps > 0. Now take an expander graph on the same set of nodes and add 1 to all the expander edges. This graph has lots of negative edge weights, but the Laplacian still manages to be PSD as long as eps > 0 is chosen to be a small enough constant. (This way of generating dual solutions is what ARV calls “expander flows.”)

2. James, it seems to me that you are describing the feasible solutions to (7), the dual of the vanilla SDP without triangle inequalities, rather than solutions to the dual of ARV.

The point of (7) is that if G and H are (adjacency matrices of) graphs such that L(G)-L(H), or, which is the same, L(G-H), has no negative eigenvalue, then $\phi(G) \geq \phi(H)$. So we can prove that the sparsest cut of G is at least $\epsilon$ by showing that if we subtract $\epsilon/n$ from every entry of the adjacency matrix of G, then all the eigenvalues of the Laplacian are still non-negative. Then the sparsest cut of G is at least the sparsest cut of the graph with $\epsilon/n$ in each entry, and this is $\epsilon$ by my normalization.

In Leighton-Rao, the dual exploits the fact if G and H are graphs such that H(i,j) units of flow can be routed in G from i to j for each (i,j), then $\phi(G) \geq H$.

As I understand it, the dual of ARV combines the two facts; to prove that $\phi(G)\geq \epsilon$, we construct a matrix H such that H(i,j) flow can be routed from i to j in G, hence $\phi(G) \geq \phi(H)$ and such that if we subtract $epsilon/n$ from each entry of H the Laplacian of the result has non-negative entries, hence $\phi(H) \geq \epsilon$.

I am finding it quite difficult, however, to reverse-engineer the ARV primal as the dual of what I just described. (I spent a day convincing myself that the dual of the Leighton-Rao relaxation is indeed multi-commodity flow, which is supposed to be obvious, so I am a bit slow with this kind of thing.)

3. Ah, never mind, you were talking about subtracting the complete graph only as an example.

About the emoticon: in an earlier version of the post there was an equation, number 8, and “$(8)$,” if not formatted inside a latex environment, looked like this: (8)

4. If you go to Settings->Writing->Formatting, you can turn off the auto-emoticon conversion.

In the dual of ARV, you basically look at a sort of residual flow graph. In a standard (Leighton-Rao style) multi-flow, this will always have positive edge weights because, by the constraints, you cannot over-congest an edge, and you cannot undersatisfy a demand. In the dual of ARV, you can do both: The weight of a pair (i,j) is

w(i,j) = Flow(i,j) – D + Cap(i,j) – Congestion(i,j)

here D is the demand, and it’s what the dual objective is trying to maximize (trying to send D units of flow from every i to every j). Cap(i,j) is the capacity of edge (i,j) (in the unweighted case, this is just 1 for edges, and 0 for other pairs). Congestion(i,j) is the amount of flow going through the edge (i,j). In a standard all-pairs flow, you would require that Flow(i,j) \geq D, and Cap(i,j) \geq Congestion(i,j), but in the ARV dual, you have a weaker notion of flow, which is that Laplacian(W) is PSD, where W = (w(i,j)) is the matrix of weights. Here, the degree of a node i is

deg(i) = \sum_{j} w(i,j)

and then Laplacian(W) = deg(W) – W, where deg(W) is the diagonal degree matrix. I think of W as the “error” matrix of the flow. In a standard flow, it would have all positive entries, while in a dual-ARV-flow, you only require Laplacian(W) to be PSD.

What you describe in your comment is more like an expander flow, where you actually have a feasible flow on some subset of the demands. The dual seems to be more powerful than this, however, able to send a flow which undersatisfies demands and overcongests edges as long as the “total error” in being a flow is PSD (rather, its Laplacian is PSD). (Maybe it’s not more powerful–that would be fantastically interesting, and I think it would reduce the Khot-Vishnoi paper to about 3 lines!)

About reversing engineering the primal, do you just want to prove weak duality like you did for the 2nd eigenvalue (i.e. show that a solution to the ARV SDP would yield an upper bound on the amount of “PSD flow” you can send?)

To do this, it helps to look at the dual-ARV program as a flow program (where the flow variables f_{ijk} are multipliers of triangle inequalities in the primal). Since triangle inequalities imply path inequalities, the only flow variables you need are f_{ijk}, i.e. flow from i to k can only be sent along two-hop paths i-j-k. (See how ARV writes the dual in their paper, but note that the factor 2 written there in lines (30) and (35) is wrong, and should be deleted.)

Now let {x_1, …, x_n} be a primal solution, let x = (x_1, …, x_n) (a vector of vectors) and consider the dual constraint:

x^T L(W) x \geq 0. (**)

This is some inequality involving the values f_{ijk}, Con(i,j) (which can be defined in terms of the flow variables), Cap(i,j), and D (the objective value). In order to see that it gives an upper bound on D, you want to make *all* the flow variables equal to 0. If you could do this, and then read off the above inequality (**), you would get an upper bound on D which is like (sum_{i,j} Cap(i,j) |x_i-x_j|^2)/(sum_{i,j} |x_i-xj|^2), i.e. exactly the primal objective.

How can you make the flow variables all equal to 0, while maintaining the inequality (**)? This is where the triangle inequalities on the vectors {x_1, …, x_n} comes in. Basically, you check that when you reduce f_{ijk} to 0, you lose in some terms, but you gain in others where Con(i,j) and Con(j,k) is reduced. By the squared-triangle inequality, you make positive net gain, hence you can make all the flow variables 0.