In which we begin to discuss the Arora-Rao-Vazirani rounding procedure.

Recall that, in a graph {G=(V,E)} with adjacency matrix {A}, then ARV relaxation of the sparsest cut problem is the following semidefinite program.

\displaystyle  \begin{array}{lll} {\rm minimize} & \frac 1 {2|E|} \sum_{u,v} A_{u,v} || {\bf x}_u - {\bf x}_v ||^2\\ {\rm subject\ to}\\ & \sum_{u,v} || {\bf x}_u - {\bf x}_v ||^2 = {|V|^2}\\ & || {\bf x}_u - {\bf x}_v||^2 \leq || {\bf x}_u - {\bf x}_w||^2 + || {\bf x}_w - {\bf x}_v||^2 & \forall u,v,w \in V\\ & \mbox{} {\bf x}_u \in {\mathbb R}^d & \forall u\in V \end{array}

If we denote by {ARV(G)} the optimum of the relaxation, then we claimed that

\displaystyle  ARV(G) \leq \phi(G) \leq O(\sqrt{\log |V|}) \cdot ARV(G)

where the first inequality follows from the fact that {ARV(G)} is a relaxation of {\phi(G)}, and the second inequality is the result whose proof we begin to discuss today.

1. Rounding the Arora-Rao-Vazirani Relaxation

Given the equivalence between the sparsest cut problem and the “{\ell_1} relaxation” of sparsest cut, it will be enough to prove the following result.

Theorem 1 (Rounding of ARV) Let {G} be a graph, {A} its adjacency matrix, and {\{ {\bf x}_v \}_{v\in V}} be a feasible solution to the ARV relaxation.

Then there is a mapping {f:V\rightarrow {\mathbb R}} such that

\displaystyle  \frac {\sum_{u,v} A_{u,v} |f(u) - f(v)|} { \sum_{u,v} |f(u)-f(v)|} \leq O(\sqrt {\log |V|} ) \cdot \frac{\sum_{u,v} A_{u,v} ||{\bf x}_u - {\bf x}_v ||^2}{\sum_{u,v} ||{\bf x}_u - {\bf x}_v||^2}

As in the rounding of the Leighton-Rao relaxation via Bourgain’s theorem, we will identify a set {S\subseteq V}, and define

\displaystyle   f_S(v) := \min_{s\in S} || {\bf x}_s - {\bf x}_v||^2 \ \ \ \ \ (1)

Recall that, as we saw in the proof of Bourgain’s embedding theorem, no matter how we choose the set {S} we have

\displaystyle  | f_S(u) - f_S(v)| \leq ||{\bf x}_u - {\bf x}_v ||^2 \ \ \ \ \ (2)

where we are not using any facts about {|| \cdot - \cdot||^2} other than the fact that, for solutions of the ARV relaxation, it is a distance function that obeys the triangle inequality.

This means that, in order to prove the theorem, we just have to find a set {S\subseteq V} such that

\displaystyle   \sum_{u,v} |f_S(u) - f_S(v)| \geq \frac 1 {O(\sqrt{\log |V|})} \cdot \sum_{u,v} ||{\bf x}_u - {\bf x}_v||^2 \ \ \ \ \ (3)

and this is a considerable simplification because the above expression is completely independent of the graph! The remaining problem is purely one about geometry.

Recall that if we have a set of vectors {\{ {\bf x}_v \}_{v\in V}} such that the distance function {d(u,v):= ||{\bf x}_u - {\bf x}_v||^2} satisfies the triangle inequality, then we say that {d(\cdot,\cdot)} is a (semi-)metric of negative type.

After these preliminaries observations, our goal is to prove the following theorem.

Theorem 2 (Rounding of ARV — Revisited) If {d(\cdot,\cdot)} is a semimetric of negative type over a set {V}, then there is a set {S} such that if we define

\displaystyle  f_S(v) := \min_{s\in S} \{ d(s,v) \}

we have

\displaystyle  \sum_{u,v} |f_S(u) - f_S(v)| \geq \frac 1 {O(\sqrt{\log |V|})} \cdot \sum_{u,v} d(u,v)

Furthermore, the set {S} can be found in randomized polynomial time with high probability given a set of vector {\{ {\bf x}_v \}_{v\in V}} such that {d(u,v) = || {\bf x}_u - {\bf x}_v||^2}.

Since the statement is scale-invariant, we can restrict ourselves, with no loss of generality, to the case {\sum_{u,v} d(u,v) = |V|^2}.

Remark 1 Let us discuss some intuition before continuing with the proof.

As our experience proving Bourgain’s embedding theorem shows us, it is rather difficult to pick sets such that {|f_S(u)-f_S(v)|} is not much smaller than {d(u,v)}. Here we have a somewhat simpler case to solve because we are not trying to preserve all distances, but only the average pairwise distance. A simple observation is that if we find a set {S} which contains {\Omega(|V|)} elements and such that {\Omega(|V|)} elements of {V} are at distance {\Omega(\delta)} from {S}, then we immediately get {\sum_{u,v} |f_S(u) - f_S(v)| \geq \Omega(\delta |V|^2)}, because there will be {\Omega(|V|^2)} pairs {u,v} such that {f_S(u)=0} and {f_S(v) \geq \delta}. In particular, if we could find such a set with {\delta = 1/O(\sqrt{\log |V|})} then we would be done. Unfortunately this is too much to ask for in general, because we always have {|f_S(u) - f_S(v)|\leq d(u,v)}, which means that if we want {\sum_{u,v} |f_S(u) - f_S(v)|} to have {\Omega(V^2)} noticeably large terms we must also have that {d(u,v)} is noticeably large for {\Omega(|V|^2)} pairs of points, which is not always true.

There is, however, the following argument, which goes back to Leighton and Rao: either there are {\Omega(|V|)} points concentrated in a ball whose radius is a quarter (say) of the average pairwise distance, and then we can use that ball to get an {\ell_1} mapping with only constant error; or there are {\Omega(|V|)} points in a ball of radius twice the average pairwise distance, such that the pairwise distances of the points in the ball account for a constant fraction of all pairwise distances. In particular, the sum of pairwise distances includes {\Omega(|V|^2)} terms which are {\Omega(1)}.

After we do this reduction and some scaling, we are left with the task of proving the following theorem: suppose we are given an {n}-point negative type metric in which the points are contained in a ball of radius 1 and are such that the sum of pairwise distances is {\Omega(n^2)}; then there is a subset {S} of size {\Omega(n)} such that there are {\Omega(n)} points whose distance from the set is {1/O(\sqrt{\log n})}. This theorem is the main result of the Arora-Rao-Vazirani paper. (Strictly speaking, this form of the theorem was proved later by Lee — Arora, Rao and Vazirani had a slightly weaker formulation.)

We begin by considering the case in which a constant fraction of the points are concentrated in a small ball.

Definition 3 (Ball) For a point {z\in V} and a radius {r>0}, the ball of radius {r} and center {z} is the set

\displaystyle  B(z,r):= \left\{ v: d(z,v) \leq r \right \}

Lemma 4 For every vertex {z}, if we define {S:= B(z,1/4)}, then

\displaystyle  \sum_{u,v} |f_S(u) - f_S(v)| \geq \frac {|S|}{2|V|} \sum_{u,v} d(u,v)

Proof: Our first calculation is to show that the typical value of {f_S(u)} is rather large. We note that for every two vertices {u} and {v}, if we call {a} a closest vertex in {S} to {u}, and {b} a closest vertex to {v} in {S}, we have

\displaystyle  d(u,v) \leq d(u,a) + d(a,z) + d(z,b) + d(b,v)

\displaystyle  \leq f_S(u) + f_S(v) + \frac 12

and so

\displaystyle  |V|^2 = \sum_{u,v} d(u,v) \leq 2 |V|\cdot \sum_v f_S(v) + \frac {|V|^2}{2}

that is,

\displaystyle  \sum_v f_S(v) \geq \frac{|V|}{2}

Now we can get a lower bound to the sum of {\ell_1} distances given by the embedding {f_S(\cdot)}.

\displaystyle  \sum_{u,v} |f_S(u)-f_S(v)|

\displaystyle \geq \sum_{u\in S,v\in V} |f_S(v)|

\displaystyle  = |S| \sum_v f_S(v)

\displaystyle  \geq \frac 12 |S| \cdot |V|

\Box

This means that if there is a vertex {z} such that {|B(z,1/4)| = \Omega(|V|)}, or even {|B(z,1/4)| = \Omega(|V|/\sqrt{\log |V|})}, then we are done.

Otherwise, we will find a set of {\Omega(|V|)} vertices such that their average pairwise distances are within a constant factor of their maximum pairwise distances, and then we will work on finding an embedding for such a set of points. (The condition that the average distance is a constant fraction of the maximal distance will be very helpful in subsequent calculations.)

Lemma 5 Suppose that for every vertex {z} we have {|B(z,1/4)| \leq |V|/4}. Then there is a vertex {w} such that, if we set {S=B(w,2)}, we have

  • {|S|\geq \frac 12 \cdot |V|}
  • {\sum_{u,v\in S} d(u,v) \geq \frac 1{8} |S|^2 }

Proof: Let {w} be a vertex that maximizes {|B(w,2)|}; then {|B(w,2)| \geq |V|/2}, because if we had {|B(u,2)| < |V|/2} for every vertex {u}, then we would have

\displaystyle  \sum_{u,v} d(u,v) > \sum_u 2 \cdot (|V- B(u,2)|) > |V|^2

Regarding the sum of pairwise distances of elements of {S}, we have

\displaystyle  \sum_{u,v \in S} d(u,v) > \sum_{u\in S} \frac 14 (|S - B(u,1/4)|) \geq |S| \cdot \frac 14 \cdot \frac {|S|}{2}

\Box

The proof of the main theorem now reduces to proving the following geometric fact.

Theorem 6 Let {d} be a negative-type metric over a set {V} such that the points are contained in a unit ball and have constant average distance, that is,

  • there is a vertex {z} such that {d(v,z)\leq 1} for every {v\in V}
  • {\sum_{u,v\in V} d(u,v) \geq c\cdot |V|^2}

Then there are sets {S,T \subseteq V} such that

  • {|S|, |T| \geq \Omega(|V|)};
  • for every {u\in S} and every {v\in S}, {d(u,v) \geq 1/{O(\sqrt {\log |V|})}}

where the multiplicative factors hidden in the {O(\cdot)} and {\Omega(\cdot)} notations depend only on {c}.

About these ads