*In which we begin the analysis of the ARV rounding algorithm*

We want to prove

Lemma 1 (ARV Main Lemma)Let be a negative-type metric over a set such that the points are contained in a unit ball and have constant average distance, that is,

- there is a vertex such that for every
Then there are sets such that

- ;
- for every and every ,

where the multiplicative factors hidden in the and notations depend only on .

In this lecture, we will show how to reduce the ARV Main Lemma to a statement of the following form: if is a set of vectors such that the metric in the ARV Main Lemma can be written as , and is a random Gaussian vectors, and if is such that with probability, there are disjoint pairs such that and , then . We will then prove such a statement in the next lecture.

**1. Bottlenecks **

Before beginning with the proof, it will be useful to see that certain variations of the ARV Main Lemma are false, and that we must use the assumptions of the lemma in a certain way in order to be able to prove it.

For example, consider the variation of the lemma in which is an arbitrary semi-metric, rather than being of negative type. We have the following counterexample.

Fact 2For every , there is a metric over such that

- for all
- For every subsets of size we have

We will not provide a full proof but here is a sketch: consider a family of constant-degree graphs of constant edge expansion. (We will see later in the course that such a family exists.) Consider the shortest-path distance on . We have:

- For every pair , , because graphs of constant expansion have logarithmic diameter (another fact that we will prove later in the course)
- , because, if is the degree of the graph, then every vertex has at most other vertices at distance at most from it, and so every vertex has at least other vertices at distance from itself.
- For every subsets of size we have
Because, if the edge expansion is and the degree is , then for every set of , there are vertices outside with neighbors in , and so the number of vertices at distance at most from is at least . If , then there is a such that more than vertices are at distance from , and the same is true for , meaning that and are at distance at most from each other.

If divide by the diameter of , which is , we obtain a metric that satisfies the conditions of the Fact above.

This means that we cannot only use the property of being a semi-metric, but we have to use the fact that it is of negative type, and we need to use in the proof the vectors such that .

Fact 2 is tight: using Bourgain’s theorem, or an earlier technique of Leighton and Rao, if is a semi-metric over such that and , then we can find sets of size such that .

Fact 3For every , there are vectors such that

- for all
- For every subsets of size we have

Here we will not even provide a sketch, but the idea is to use an -net of the sphere of radius in dimension , with , and the isoperimeteric inequality for the sphere.

This means that we need to use the fact that our vectors satisfy the triangle inequalities . It is also worth noting that for all vectors, including those of Fact 3, we have

so any argument that proves the ARV Main Lemma will need to use the triangle inequalities in a way that breaks down if we substitute them with the above “factor-of-2-triangle-inequalities”.

Fact 3 is also tight, up to the factor of , as we will see later in this lecture.

Finally, we note that the ARV Main Lemma is tight, which means that every step of its proof will have to involve statements that are tight up to constant factors.

Fact 4For every that is a power of two, there is a negative-type metric over a set of size such that

- for all
- For every subsets of size we have

Let and . The Hamming distance is a negative-type metric over (let be itself, and notice that ) , and it satisfies

- for all
- For every subsets of size we have
which follows from isoperimetric results on the hypercube that we will not prove

Fact 4 follows by dividing the above metric by .

**2. Gaussian Projections **

The tool of *Gaussian projections* is widely used to analyze semidefinite programs. Given vectors which are solutions to a semidefinite program of interest, we pick a random Gaussian vector , and we consider the projections , where . The vector is sampled so that the coordinates are independent standard normal distributions.

We see that each has a Gaussian distribution with expectation 0 and variance , and each difference has a gaussian distribution with expectation 0 and variance .

From standard bounds on Gaussian random variables,

And, setting in (2), we get

Our first result is that, with probability, there are pairs such that .

Lemma 5There are constants that depend only on such that with probability at least , if we let be the indices with smallest , and be the indices with largest , we have

*Proof:* A standard Markov argument shows that if for all pairs , and , then there are at least pairs at distance at least . We argue that, with probability at least , of those pairs are such that , which implies the conclusion.

Let be the set of “far” pairs such that .

By setting in (1), we have for each

so, by linearity of expectation,

and by Markov inequality

so, with probability , there are at least pairs such that .

If and are defined as above, and , then the number of pairs at distance is at most

and the lemma follows if we set and .

Note that, with probability, we have sets , , both of size , such that

so that

Since we have not used the triangle inequality, the above bound is almost best possible, given Fact 3 .

**3. The Algorithm to Refine and **

Consider the following algorithm, given satisfying the assumptions of the Main Lemma, and a parameter ,

- Pick a random gaussian vector
- Define for
- Let be the indices for which is smallest
- Let be the indices for which is largest
- while there is an and such that
- remove from and from

- return

Where are the constants (that depend only on ) of Lemma 5. We will prove

Lemma 6There is a constant (dependent only on ) such that, if we set , there is at least a probability that the algorithm removes at most pairs in the `while’ loop.

Once we establish the above lemma, we have completed our proof of the ARV Main Lemma, because, with probability, the output of the algorithm is a pair of sets of size such that for each and we have .

We will prove the contrapositive, that is, if the algorithm has probability at least of removing at least pairs in the `while’ loop, then .

Call the set of pairs removed by the algorithm (like , and , is a random variable determined by ). If the algorithm has probability at least of removing at least pairs in the `while’ loop, then there is a probability at least that the above happens, and that . This means that with probability at least there are disjoint pairs such that and .

By the above observation, the following lemma implies Lemma 6 and hence the ARV Main Lemma.

Lemma 7Let be a negative-type metric over a set , let be vectors such that , let be a random vector with a Gaussian distribution, and let .Suppose that, for constants and a parameter , we have that there is a probability that there are at least pairs such that and .

Then there is a constant , that depends only on and , such that

We will prove Lemma 7 in the next lecture.