# Bourgain’s Embedding of Any Metric into L1

In which we spend more than sixteen hundred words to explain a three-line proof.

In the last post, we left off with the following problem. We have a set $V$ of $n$ “vertices,” a semi-metric $d: V\times V \rightarrow {\mathbb R}$, and we want to find a distribution over sets $A\subseteq V$ such that for every two vertices $i,j$

$(1) \ \ \ \displaystyle {\mathbb E} | d(A,i) - d(A,j) | \geq \frac 1 {O(\log n)} \cdot d(i,j)$

where

$\displaystyle d(A,i) := \min_{a\in A} d(a,i)$

This will give us a way to round a solution of the Leighton-Rao relaxation to an actual cut with only an $O(\log n)$ loss in the approximation.

Before getting to the distribution which will do the trick, it is helpful to consider a few examples.

• Example 1: all points are at distance 1 from each other.

Then $|d(A,i)-d(A,j)|$ is equal to either 0 or 1, and it is 1 if and only if $A$ contains exactly one of $i$ or $j$. If $A$ is a uniformly chosen random set, then the above condition is satisfied with probability $1/2$, so we have the stronger bound

$\displaystyle {\mathbb E} | d(A,i) - d(A,j) | \geq \frac 12 \cdot d(i,j)$

[Indeed, even better, we have ${\mathbb E} | d(A,i) - d(A,j) | = \frac 12 \cdot d(i,j)$, which is an isometric embedding.]

• Example 2: all points are at distance either 1 or 2 from each other.

If $A$ contains exactly one of the vertices $i,j$, then $|d(A,i)-d(A,j)| \geq 1$, and so, if we choose $A$ uniformly at random we have

$\displaystyle {\mathbb E} | d(A,i) - d(A,j) | \geq \frac 14 \cdot d(i,j)$

These examples may trick us into thinking that a uniformly chosen random set always work, but this unfortunately is not the case.

• Example 3: Within a set $S$ of size $n/2$, all distances are 1, and the same is true within $V-S$; the distance between elements of $S$ and elements of $V-S$ is $n$.

If we consider $i \in S$ and $j \in V-S$, then we are in trouble whenever $A$ contains elements from both sets, because then $|d(A,i)-d(A,j)| \leq 1$, while $d(i,j)=n$. If we pick $A$ uniformly at random, then $A$ will essentially always, except with exponentially small probability, contain elements from both $S$ and $V-S$. If, however, we pick $A$ to be a random set of size 1, then we are going to get $|d(A,i)-d(A,j)| \geq n-2$ with probability at last $1/2$, which is great.

Choosing a set of size 1, however, is a disaster inside $S$ and inside $V-S$, where almost all distances $|d(A,i)-d(A,j)|$ collapse to zero. For those pairs, however, we know that choosing $A$ uniformly at random works well.

The solution is thus: with probability 1/2, pick $A$ uniformly at random; with probability 1/2, pick $A$ of size 1.

So far we are actually getting away with ${\mathbb E} |X_i - X_j|$ being a constant fraction of $d(i,j)$. Here is a slightly trickier case.

• Example 4: The shortest path metric in a $\sqrt n \times \sqrt n$ grid.

Take two vertices $i,j$ at distance $d$. We can get, say, $|d(A,i) - d(A,j)| \geq d/3$, provided that $A$ avoids all vertices at distance $\leq 2d/3$ from $i$, and it includes some vertex at distance $\leq d/3$ from $j$. In a grid, the number of vertices at distance $\leq d$ from a given vertex is $\Theta(d^2)$, so our goal is to pick $A$ so that it avoids a certain set of size $\Theta((2d/3)^2) = \Theta(d^2)$ and it hits another set of size $\Theta( (d/3)^2) = \Theta(d^2)$. If we pick $A$ to be a random set of size about $n/d^2$, both events hold with constant probability.

Now, what works for a certain distance won’t work for a different distance, so it seems we have to do something like picking $t$ from $1$ to $n$, and then pick a random set $A$ of size $n/t$. This is however too bad, because our chance of getting a set of the right size would only be $1/n$, while we can only lose a factor $1/\log n$. The solution is to pick $t$ at random from $\{ 1,2,4,\cdots,n\}$, and then pick $A$ of size $A/t$. With probability $1/\log n$ we get the right size of $A$, up to a factor of two.

It turns out that the last example gives a distribution that works in all cases:

• Pick $t$ at random in $1 \cdots \log n$
• Pick a random set $A$ so that each $i\in V$ is selected to be in $A$ independently and with probability $1/2^t$

Now, it would be nice to show that (as in the examples we have seen so far) for every semi-metric $d$ and two vertices $i,j$, there is a size parameter $t$ such that when $A$ is chosen to be a random set of size $n/2^t$ we have ${\mathbb E} |d(A,i)-d(A,j)| \geq \Omega(d(i,j))$.

This would mean that, after we lose a factor of $\log n$ to “guess” the right density, we have the desired bound (3). Unfortunately this is too much to ask for; we shall instead work out an argument that uses contributions from all densities.

It is good to see one more example.

• Example 5: A 3-regular graph of logarithmic girth.

Let $i,j$ be two vertices whose distance $d$ is less than the girth. In the example of the grid, we considered all vertices at distance $\leq 2d/3$ from $i$ and all vertices at distance $\leq d/3$ from $j$; in this case, there are $\approx 2^{2d/3}$ of the former, and $\approx 2^{d/3}$ of the latter, and it is hopeless to expect that $A$, no matter its density, can avoid all of the former, but hit some of the latter.

If, however, I consider the points at distance $\leq \ell+1$ from $i$ and the points at distance $\leq \ell$ from $j$, they are off only by a constant factor, and there is a constant probability of avoiding the former and hitting the latter when $t = \ell$. So conditioned on each $t$ between $1$ and $d/2$, the expectation of $|d(A,i) - d(A,j)|$ is at least $\Omega(1)$, and, overall, the expectation is at least $\Omega(d/\log n)$.

We are now more or less ready to tackle the general case.

We look at two vertices $i,j$ at distance $d(i,j)$ and we want to estimate ${\mathbb E} |d(A,i)-d(A,j)|$.

Let us estimate the contribution to the expectation coming from the case in which we choose a particular value of $t$. In such a case, there is a constant probability that

1. $A$ contains none of the $2^{t+1}-1$ vertices closest to $i$, but at least one of the of $2^t$ vertices closest to $j$. [Assuming the two sets are disjoint]
2. $A$ contains none of the $2^{t+1}-1$ vertices closest to $j$, but at least one of the of $2^t$ vertices closest to $i$. [Assuming the two sets are disjoint]

Notice that events (1) and (2) are disjoint, so we are allowed to sum their contributions to the expectation, without doing any double-counting.

Call $ri_k$ the distance of the the $k$-th closest vertex from $i$, and similarly $rj_k$. Then, if event (1) happens, $d(A,i) \geq ri_{2^{t+1}}$ and $d(A,j) \leq rj_{2^t}$, in which case

$|d(A,i) - d(A,j)| \geq d(A,i) - d(A,j) \geq ri_{2^{t+1}} - rj_{2^t}$

we can similarly argue that if (2) happens, then

$|d(A,i) - d(A,j)| \geq d(A,j) - d(A,i) \geq rj_{2^{t+1}} - ri_{2^t}$

Call $r'i_{k} := \min \{ d(i,j)/3, ri_k\}$ and $r'j_{k} := \min \{ d(i,j)/3, rj_k\}$.

Let $t^*$ be the smallest $t$ such that either $ri_{2^{t+1}}\geq d(i,j)/3$
or $rj_{2^{t+1}}\geq d(i,j)/3$. Then for $t\leq t^*-1$ the events described in (1) and (2) are well-defined, and the contributions to the expectation is at least

$\frac 1 {\log n} \cdot ( ri_{2^{t+1}} - rj_{2^t} + rj_{2^{t+1}} - ri_{2^t} ) =$
$= \frac 1 {\log n} \cdot ( r'i_{2^{t+1}} - r'j_{2^t} + r'j_{2^{t+1}} - r'i_{2^t} )$

When $t=t^*$, we can verify that the contribution to the expectation is at least

$= \frac 1 {\log n} \cdot ( r'i_{2^{t^*+1}} - r'j_{2^{t^*}} + r'j_{2^{t^*+1}} - r'i_{2^{t^*}} )$
$\geq \frac 1 {\log n} \cdot ( d(i,j)/3 - r'j_{2^{t^*}} - r'i_{2^{t^*}}$

And if we sum the contributions for $t=0,\cdots t^*$, the sum telescopes and we are left with

$\displaystyle {\mathbb E} |d(A,i) - d(A,j) | \geq \Omega\left( \frac 1 {\log n} \right) \cdot \left( \frac{d(i,j)}{3} - r'i_{1} - r'j_1 \right)$

where $r'i_1 = r'j_1 = 0$.

At long last, we have completed the proof.

Notice that the $O(\log n)$ factor we have lost is best possible in light of the expander example we saw in the previous post. In many examples, however, we lost only a constant factor. It is a great open question whether it is possible to lose only a constant factor whenever the metric is a shortest-path metric on a planar graph.