# CS294 Lecture 9: The Sparsest Cut Problem

In which we introduce the sparsest cut problem and the Leighton-Rao relaxation.

1. The Uniform Sparsest Cut problem, Edge Expansion and ${\lambda_2}$

Let ${G=(V,E)}$ be an undirected graph with ${n:= | V |}$ vertices.

We define the uniform sparsity of a cut ${(S,V-S)}$ as

$\displaystyle {\sf usc}_G(S) := \frac {E(S,V-S)}{ |S| \cdot |V-S| }$

(we will omit the subscript when clear from the context) and the uniform sparsest cut of a graph is

$\displaystyle {\sf usc}(G):= \min_{S} {\sf usc}_G(S)$

In ${d}$-regular graphs, approximating the uniform sparsest cut is equivalent (up to a factor of 2 in the approximation) to approximating the edge expansion, because, for every cut ${(S,V-S)}$, we have

$\displaystyle \phi(S,V-S) = \frac {E(S,V-S)}{d \cdot \min \{ |S|, |V-S| \} }$

and, noting that, for every, ${S}$,

$\displaystyle \frac 1n |S| \cdot |V-S| \leq \min \{ |S|, |V-S| \} \leq \frac 2n |S| \cdot |V-S|$

we have, for every ${S}$,

$\displaystyle \phi (S,V-S) \leq \frac nd \cdot {\sf usc}(S) \leq 2 \phi(S,V-S)$

and so

$\displaystyle \phi(G) \leq \frac nd \cdot {\sf usc}(G) \leq 2 \phi(G)$

It will be instructive to see that, in ${d}$-regular graphs, ${\lambda_2}$ is a relaxation of ${\frac nd {\sf usc}(G)}$, a fact that gives an alternative proof of the easy direction ${\lambda_2 \leq 2 \phi(G)}$ of Cheeger’s inequalities.

If ${G}$ is ${d}$-regular, then ${\lambda_2}$ satisfies

$\displaystyle \lambda_2 = \min_{{\bf x} \in {\mathbb R}^n - \{ {\bf 0} \}, {\bf x} \perp {\bf 1} } \frac { \sum_{ \{ u,v \} \in E } (x_u - x_v)^2}{\sum_v dx_v^2 }$

$\displaystyle = \min_{{\bf x} \in {\mathbb R}^n - \{ {\bf 0} \}, {\bf x} \perp {\bf 1} }\ \ \frac {n}{d} \cdot \frac { \sum_{ \{ u,v \} \in E } (x_u - x_v)^2}{\sum_{ \{ u,v \} } (x_u - x_v)^2 }$

$\displaystyle = \min_{{\bf x} \in {\mathbb R}^n - \{ {\bf 0} \} } \ \frac {n}{d} \cdot \frac { \sum_{ \{ u,v \} \in E } (x_u - x_v)^2}{\sum_{ \{ u,v \} } (x_u - x_v)^2 }$

where the first identity above comes from the fact that

$\displaystyle \sum_{\{ u,v \}} (x_u - x_v)^2 = \frac 12 \sum_{(u,v) \in V^2 } (x_u - x_v)^2 = n \sum_v x_v^2 - \sum_{u,v} x_u x_v$

$\displaystyle = n \sum_v x_v^2 - \left ( \sum_v x_v \right )^2$

$\displaystyle = n \sum_v x_v^2 - \langle {\bf x}, {\bf 1} \rangle ^2$

$\displaystyle = n \sum_v x_v^2$

and the second identity follows by noticing that the cost function is invariant by addition of a multiple of ${{\bf 1}}$, and so optimizing over all non-zero vectors gives the same result as optimizing over all vectors orthogonal to ${{\bf 1}}$.

On the other hand, the uniform sparsest cut problem can be formulated as

$\displaystyle {\sf usc}(G) = \min_{{\bf x} \in \{ 0,1 \}^n - \{ {\bf 0}, {\bf 1} \} } \ \ \ \frac { \sum_{ \{ u,v \} \in E } (x_u - x_v)^2}{\sum_{ \{ u,v \} } (x_u - x_v)^2 }$

(because the square of a number in ${\{ -1,0,1\}}$ is the same as its absolute value) and we see that ${\lambda_2}$ can be considered a continuous relaxation of ${\frac nd {\sf usc}(G)}$.

2. The Non-Uniform Sparsest Cut Problem

In the non-uniform sparsest cut problem, we are given two graphs ${G=(V, E_G)}$ and ${H= (V, E_H)}$, over the same set of vertices; the non-uniform sparsity of a cut ${(S,V-S)}$ is defined as

$\displaystyle {\sf sc}_{G,H}(S) := \frac { E_G(S,V-S)}{E_H(S,V-S)}$

and the non-uniform sparsest cut problem is the optimization problem

$\displaystyle {\sf sc}(G,H) := \min_S \ \ {\sf sc}_{G,H} (S)$

Note that the non-uniform sparsest cut problem generalizes the sparsest cut problem (consider the case in which ${H}$ is a clique).

If ${H}$ is the graph that contains the single edge ${\{ s,t \}}$, then ${{\sf sc}(G,H)}$ is the undirected min-st-cut problem, in which we want to find the cut that separates two vertices ${s}$ and ${t}$ and that minimizes the number of crossing edges.

3. The Leighton-Rao Relaxation

We can write the non-uniform sparsity of a set as

$\displaystyle {\sf sc}_{G,H} (S) = \frac{\sum_{ \{ u,v \} \in E_G}|1_S(u) - 1_S(v)| } {\sum_{\{ u,v \} \in E_H} |1_S(u) - 1_S(v) | }$

The observation that led us to see ${\lambda_2}$ as the optimum of a continuous relaxation of ${\frac nd {\sf sc}_{G,K_n}}$ was to observe that ${|1_S(u)-1_S(v)| = |1_S(u) - 1_S(v)|^2}$, and then relax the problem by allowing arbitrary functions ${x: V \rightarrow {\mathbb R}}$ instead of indicator functions ${1_S : V \rightarrow \{ 0,1 \}}$.

The Leighton-Rao relaxation of sparsest cut is obtained using, instead, the following observation: if, for a set ${S}$, we define ${d_S(u,v) := |1_S(i) - 1_S(j)|}$, then ${d_S(\cdot,\cdot)}$ defines a semi-metric over the set ${V}$, because ${d_S}$ is symmetric, ${d_S(v,v)=0}$, and the triangle inequality holds. So we could think about allowing arbitrary semi-metrics in the expression for ${{\sf sc}}$, and define

$\displaystyle LR(G,H):= \min_{\begin{array}{l} d:V\times V \rightarrow {\mathbb R}\\ d \ \mbox{semi-metric} \end{array}} \frac{\sum_{\{ u,v \} \in E_G} d(u,v) }{\sum_{\{ u,v \} \in E_H} d(u,v )} \ \ \ \ \ (1)$

This might seem like such a broad relaxation that there could be graphs on which ${LR(G,H)}$ bears no connection to ${{\sf sc}(G,H)}$. Instead, we will prove the fairly good estimate

$\displaystyle LR(G,H) \leq {\sf sc}(G,H) \leq O(\log |V|) \cdot LR(G,H) \ \ \ \ \ (2)$

The value ${LR(G,H)}$ and an optimal ${d(\cdot,\cdot)}$ can be computed in polynomial time by solving the following linear program

$\displaystyle \begin{array}{lll} {\rm minimize} & \sum_{\{ u,v \} \in E_G} d_{u,v} \\ {\rm subject\ to} \\ & \sum_{\{ u,v \} \in E_H} d_{u,v} = 1 \\ & d_{u,v} \leq d_{u,z} + d_{z,v} & \forall u,v,z \in V\\ & d_{u,v} \geq 0 & \forall u,v \in V \end{array} \ \ \ \ \ (3)$

that has a variable ${d_{u,v}}$ for every unordered pair of distinct vertices ${\{ u,v \}}$. Clearly, every solution to the linear program (3) is also a solution to the right-hand side of the definition (1) of the Leighton-Rao parameter, with the same cost. Also every semi-metric can be normalized so that ${\sum_{\{ u,v \} \in E_H} d(u,v) = 1}$ by multiplying every distance by a fixed constant, and the normalization does not change the value of the right-hand side of (1); after the normalization, the semimetric is a feasible solution to the linear program (3), with the same cost.

4. An L1 Relaxation of Sparsest Cut

In the Leighton-Rao relaxation, we relax distance functions of the form ${d(i,j) = |1_S(i) - 1_S(j)|}$ to completely arbitrary distance functions. Let us consider an intermediate relaxation, in which we allow distance functions that can be realized by an embedding of the vertices in an ${\ell_1}$ space.

Recall that, for a vector ${{\bf x} \in {\mathbb R}^n}$, its ${\ell_1}$ norm is defined as ${|| {\bf x}||_1 := \sum_i |x_i |}$, and that this norm makes ${{\mathbb R}^n}$ into a metric space with the ${\ell_1}$ distance function

$\displaystyle || {\bf x} - {\bf y}||_1 = \sum_i |x_i - y_i |$

The distance function ${d(u,v) = |1_S(u) - 1_S(v)|}$ is an example of a distance function that can be realized by mapping each vertex to a real vector, and then defining the distance between two vertices as the ${\ell_1}$ norm of the respective vectors. Of course it is an extremely restrictive special case, in which the dimension of the vectors is one, and in which every vertex is actually mapping to either zero or one. Let us consider the relaxation of sparsest cut to arbitrary ${\ell_1}$ mappings, and define

$\displaystyle {\sf L1sc} (G,H) := \inf_{m, f: V\rightarrow {\mathbb R}^m} \frac {\sum_{\{ u,v \} \in E_G} || f(u) - f(v) ||_1 } {\sum_{\{ u,v \} \in E_H} || f(u) - f(v) ||_1 }$

This may seem like another very broad relaxation of sparsest cut, whose optimum might be much smaller than the sparsest cut optimum. The following theorem shows that this is not the case.

Theorem 1 For every graphs ${G,H}$, ${{\sf sc}(G,H) = {\sf L1sc}(G,H)}$.

Furthermore, there is a polynomial time algorithm that, given a mapping ${f: V\rightarrow {\mathbb R}^m}$, finds a cut ${S}$ such that

$\displaystyle \frac {\sum_{\{ u,v \} \in E_G} |1_S(u) - 1_S(v) |} {\sum_{\{ u,v \} \in E_H} |1_S(u) - 1_S(v)| }\leq \frac {\sum_{\{ u,v \} \in E_G} || f(u) - f(v) ||_1 } {\sum_{\{ u,v \} \in E_H} || f(u) - f(v) ||_1 } \ \ \ \ \ (4)$

Proof: We use ideas that have already come up in the proof the difficult direction of Cheeger’s inequality. First, recall that for every nonnegative reals ${a_1,\ldots,a_m}$ and positive reals ${b_1,\ldots,b_m}$ we have

$\displaystyle \frac {a_1 + \cdots a_m}{b_1 + \cdots b_m} \geq \min_i \frac{a_i}{b_i} \ \ \ \ \ (5)$

as can be seen by noting that

$\displaystyle \sum_j a_j = \sum_j b_j \cdot \frac{a_j}{b_j} \geq \left( \min_i \frac{a_i}{b_i} \right) \cdot \sum_j b_j$

Let ${f_i(v)}$ be the ${i}$-th coordinate of the vector ${f(v)}$, thus ${f(v) = (f_1(v),\ldots,f_m(v))}$. Then we can decompose the right-hand side of (4) by coordinates, and write

$\displaystyle \frac {\sum_{\{ u,v \} \in E_G} || f(u) - f(v) ||_1 } {\sum_{\{ u,v \} \in E_H} || f(u) - f(v) ||_1 }$

$\displaystyle = \frac {\sum_i \sum_{\{ u,v \} \in E_G} | f_i(u) - f_i(v) | } {\sum_i \sum_{\{ u,v \} \in E_H} | f_i(u) - f_i(v) | }$

$\displaystyle \geq \min_i \frac {\sum_{\{ u,v \} \in E_G} | f_i(u) - f_i(v) | } { \sum_{\{ u,v \} \in E_H} | f_i(u) - f_i(v) | }$

This already shows that, in the definition of ${\phi'}$, we can map, with no loss of generality, to 1-dimensional ${\ell_1}$ spaces.

Let ${i^*}$ be the coordinate that achieves the minimum above. Because the cost function is invariant under the shifts and scalings (that is, the cost of a function ${x\rightarrow f(x)}$ is the same as the cost of ${x\rightarrow af(x) + b}$ for every two constants ${a\neq 0}$ and ${b}$) there is a function ${g:V\rightarrow {\mathbb R}}$ such that ${g}$ has the same cost function as ${f_{i*}}$ and it has a unit-length range ${\max_v g(v) -\min_v g(v) = 1}$.

Let us now pick a threshold ${t}$ uniformly at random from the interval ${[\min_v g(v) , \max_v g(v)]}$, and define the random variables

$\displaystyle S_t:= \{ v: g(v) \leq t \}$

We observe that for every pairs of vertices ${u,v}$ we have

$\displaystyle \mathop{\mathbb E} |1_{S_t}( u) - 1_{S_t}(v) | = |g(u) - g(v)|$

and so we get

$\displaystyle \frac {\sum_{\{ u,v \} \in E_G} || f(u) - f(v) ||_1 } {\sum_{\{ u,v \} \in E_H} || f(u) - f(v) ||_1 }$

$\displaystyle \geq \frac {\sum_{\{ u,v \} \in E_G} | g(u) - g(v) | } { \sum_{\{ u,v \} \in E_H} | g(u) - g(v) | }$

$\displaystyle = \frac {\mathop{\mathbb E} \sum_{\{ u,v \} \in E_G} |1_{S_t}(u) - 1_{S_t}(v) |} {\mathop{\mathbb E}\sum_{\{ u,v \} \in E_H} |1_{S_t} (u)- 1_{S_t}(v)| }$

Finally, by an application of (5), we see that there must be a set ${S}$ among the possible values of ${S_t }$ such that (4) holds. Notice that the proof was completely constructive: we simply took the coordinate ${f_{i^*}}$ of ${f}$ with the lowest cost function, and then the “threshold cut” given by ${f_{i^*}}$ with the smallest sparsity. $\Box$

5. A Theorem of Bourgain

We will derive our main result (2) from the L1 “rounding” process of the previous section, and from the following theorem of Bourgain (the efficiency considerations are due to Linial, London and Rabinovich).

Theorem 2 (Bourgain) Let ${d: V\times V \rightarrow {\mathbb R}}$ be a semimetric defined over a finite set ${V}$. Then there exists a mapping ${f: V \rightarrow {\mathbb R}^m}$ such that, for every two elements ${u,v \in V}$,

$\displaystyle || f(u) - f(v)||_1 \leq d(u,v) \leq ||f(u)-f(v)||_1 \cdot c\cdot \log |V|$

where ${c}$ is an absolute constant. Given ${d}$, the mapping ${f}$ can be found with high probability in randomized polynomial time in ${|V|}$.

To see that the above theorem of Bourgain implies (2), consider a graph ${G}$, and let ${d}$ be the optimal solution of the Leighton-Rao relaxation of the sparsest cut problem on ${G}$, and let ${f:V\rightarrow {\mathbb R}}$ be a mapping as in Bourgain’s theorem applied to ${d}$. Then

$\displaystyle LR(G,H) = \frac{\sum_{\{ u,v \} \in E_G} d(u,v) }{\sum_{\{ u,v \} \in E_H} d(u,v) }$

$\displaystyle \geq \frac{\sum_{\{ u,v \} \in E_G} ||f(u)-f(v) ||_1 }{c\cdot \log |V| \cdot \sum_{\{ u,v \} \in E_H} ||f(u)-f(v)||_1 }$

$\displaystyle \geq \frac 1 {c\cdot \log |V|} \cdot {\sf sc} (G,H)$

## 5 thoughts on “CS294 Lecture 9: The Sparsest Cut Problem”

1. Typo on definition of L1sc above Theorem 1: should be “L1sc(G, H)” rather than “L1sc(G)”

2. Look this problem:

A succinct representation of a set of (distinct) b-bits positive integers is a Boolean circuit C with b input gates. The set represented by C, denoted S_{C}, is defined as follows: Every possible integer of S_{C} should be between 0 and (2^{b} – 1). And j is an element of S_{C} if and only if C accepts the binary representations of the b-bits integer j as input. The problem SUCCINCT MAXIMUM is now this: Given the succinct representation C of a set S_{C} and a b-bits integer x, where C is a Boolean circuit with b input gates, is x the maximum in S_{C}?

It is very easy to show this problem is not in P, because we should need n comparisons to know whether x is the maximum in a set of n (distinct) positive integers when the set is arbitrary. And this number of comparisons will be optimal. This would mean we cannot always accept every instance (C; x) of SUCCINCT MAXIMUM in polynomial-time, because we must use at least n = |S_{C}| comparisons for infinite amount of cases, where |S_{C}| is the cardinality of S_{C}. However, n could be exponentially more large than the size of (C; x).

But, at the same time, it is so easy to show this problem is in coNP. Certainly, given a b-bits integer y, we can check whether C accepts the binary representation of y (which means that y is an element of S_{C}) and x < y in polynomial-time, and thus, we could verify whether (C; x) is a "no" instance SUCCINCT MAXIMUM in polynomial-time.

However, the existence of a problem in coNP and not in P is sufficient to show that P is not equal to NP, because if P would be equal to NP, then P = coNP.

Basically is this, but you could see more in

https://hal.archives-ouvertes.fr/hal-01281254/document

3. @Yu Zhao. Thanks for the corrections!

4. Typo after (5), in “This already shows that, in the definition of {\phi’},…” should probably be “This already shows that, in the definition of f,…”