CS359G Lecture 9: Bourgain’s Embedding

In which we prove that every metric can be embedded into L1 with logarithmic distortion.

Today we prove the following theorem.

Theorem 1 (Bourgain) Let {d: V\times V \rightarrow {\mathbb R}} be a semimetric defined over a finite set {V}. Then there exists a mapping {F: V \rightarrow {\mathbb R}^m} such that, for every two elements {u,v \in R},

\displaystyle  || F(u) - F(v)||_1 \leq d(u,v) \leq ||F(u)-F(v)||_1 \cdot c\cdot \log |V|

where {c} is an absolute constant. Given {d}, the mapping {F} can be found with high probability in randomized polynomial time in {|V|}.

Together with the results that we proved in the last lecture, this implies that an optimal solution to the Leighton-Rao relaxation can be rounded to an {O(\log n)}-approximate solution to the sparsest cut problem. This was the best known approximation algorithm for sparsest cut for 15 years, until the Arora-Rao-Vazirani algorithm, which will be our next topic.

Continue reading

CS359G Lecture 8: the Leighton-Rao Relaxation

In which we introduce the Leighton-Rao relaxation of sparsest cut.

Let {G=(V,E)} be an undirected graph. Unlike past lectures, we will not need to assume that {G} is regular. We are interested in finding a sparsest cut in {G}, where the sparsity of a non-trivial bipartition {(S,V-S)} of the vertices is

\displaystyle  \phi_G(S) := \frac{\frac{1}{|E|} \cdot Edges(S,V-S) } {\frac {2}{V^2} \cdot |S| \cdot |V-S| }

which is the ratio between the fraction of edges that are cut by {(S,V-S)} and the fraction of pairs of vertices that are disconnected by the removal of those edges.

Another way to write the sparsity of a cut is as

\displaystyle  \phi_G(S) := \frac {|V|^2}{2|E|} \cdot \frac{\sum_{i,j} A_{i,j} |1_S(i) - 1_S(j)| } {\sum_{i,j} |1_S(i) - 1_S(j) | }

where {A} is the adjacency matrix of {G} and {1_S(\cdot)} is the indicator function of the set {S}.

The observation that led us to see {1-\lambda_2} as the optimum of a continuous relaxation of {\phi} was to observe that {|1_S(i)-1_S(j)| = |1_S(i) - 1_S(j)|^2}, and then relax the problem by allowing arbitrary functions {x: V \rightarrow {\mathbb R}} instead of indicator functions {1_S : V \rightarrow \{ 0,1 \}}.

The Leighton-Rao relaxation of sparsest cut is obtained using, instead, the following observation: if, for a set {S}, we define {d_S(i,j) := |1_S(i) - 1_S(j)|}, then {d_S(\cdot,\cdot)} defines a semi-metric over the set {V}, because {d_S} is symmetric, {d_S(i,i)=0}, and the triangle inequality holds. So we could think about allowing arbitrary semi-metrics in the expression for {\phi}, and define

\displaystyle  LR(G):= \min_{\begin{array}{l} d:V\times V \rightarrow {\mathbb R}\\ d \ \mbox{semi-metric} \end{array}} \frac {|V|^2}{2|E|} \cdot \frac{\sum_{i,j} A_{i,j} d(i,j) }{\sum_{i,j} d(i,j)} \ \ \ \ \ (1)

This might seem like such a broad relaxation that there could be graphs on which {LR(G)} bears no connection to {\phi(G)}. Instead, we will prove the fairly good estimate

\displaystyle   LR(G) \leq \phi(G) \leq O(\log |V|) \cdot LR(G) \ \ \ \ \ (2)

Furthermore, we will show that {LR(G)}, and an optimal solution {d(\cdot,\cdot)} can be computed in polynomial time, and the second inequality above has a constructive proof, from which we derive a polynomial time {O(\log |V|)}-approximate algorithm for sparsest cut.

1. Formulating the Leighton-Rao Relaxation as a Linear Program

The value {LR(G)} and an optimal {d(\cdot,\cdot)} can be computed in polynomial time by solving the following linear program

\begin{array}{lll}{\rm minimize} &  \sum_{i,j} A_{i,j} d_{i,j}  \\ {\rm subject\ to} \\ & \sum_{i,j} d_{i,j} = \frac {|V|^2}{2|E|}  \\ & d_{i,k} \leq d_{i,j} + d_{j,k} & \forall i,j,k \in V\\ & d_{i,j} \geq 0 & \forall i \in V\end{array} \ \ \ \ \ (3)

that has a variable {d_{i,j}} for every unordered pair of distinct vertices {i,j}. Clearly, every solution to the linear program (3) is also a solution to the right-hand side of the definition (1) of the Leighton-Rao parameter, with the same cost. Also every semi-metric can be normalized so that {\sum_{i,j} d(i,j) = |V|^2/2|E|} by multiplying every distance by a fixed constant, and the normalization does not change the value of the right-hand side of (1); after the normalization, the semimetric is a feasible solution to the linear program (3), with the same cost.

In the rest of this lecture and the next, we will show how to round a solution to (3) into a cut, achieving the logarithmic approximation promised in (2).

2. An L1 Relaxation of Sparsest Cut

In the Leighton-Rao relaxation, we relax distance functions of the form {d(i,j) = |1_S(i) - 1_S(j)|} to completely arbitrary distance functions. Let us consider an intermediate relaxation, in which we allow distance functions that can be realized by an embedding of the vertices in an {\ell_1} space.

Recall that, for a vector {{\bf x} \in {\mathbb R}^n}, its {\ell_1} norm is defined as {|| {\bf x}||_1 := \sum_i |x_i|}, and that this norm makes {{\mathbb R}^n} into a metric space with the {\ell_1} distance function

\displaystyle  || {\bf x} - {\bf y}||_1 = \sum_i |x_i - y_i |

The distance function {d(i,j) = |1_S(i) - 1_S(j)|} is an example of a distance function that can be realized by mapping each vertex to a real vector, and then defining the distance between two vertices as the {\ell_1} norm of the respective vectors. Of course it is an extremely restrictive special case, in which the dimension of the vectors is one, and in which every vertex is actually mapping to either zero or one. Let us consider the relaxation of sparsest cut to arbitrary {\ell_1} mappings, and define

\displaystyle  \phi'(G) := \inf_{m, f: V\rightarrow {\mathbb R}^m} \frac{|V|^2}{2|E|} \cdot \frac {\sum_{i,j} A_{i,j} || f(i) - f(j) ||_1 } {\sum_{i,j} || f(i) - f(j) ||_1 }

This may seem like another very broad relaxation of sparsest cut, whose optimum might bear no correlation with the sparsest cut optimum. The following theorem shows that this is not the case.

Theorem 1 For every graph {G}, {\phi(G) = \phi'(G)}.

Furthermore, there is a polynomial time algorithm that, given a mapping {f: V\rightarrow {\mathbb R}^m}, finds a cut {S} such that

\displaystyle  \frac {\sum_{u,v} A_{u,v} |1_S(u) - 1_S(v) |} {\sum_{u,v} |1_S(u) - 1_S(v)| }\leq \frac {\sum_{u,v} A_{u,v} || f(u) - f(v) ||_1 } {\sum_{u,v} || f(u) - f(v) ||_1 } \ \ \ \ \ (4)

Proof: We use ideas that have already come up in the proof the difficult direction of Cheeger’s inequality. First, we note that for every nonnegative reals {a_1,\ldots,a_m} and positive reals {b_1,\ldots,b_m} we have

\displaystyle   \frac {a_1 + \cdots a_m}{b_1 + \cdots b_m} \geq \min_i \frac{a_i}{b_i} \ \ \ \ \ (5)

as can be seen by noting that

\displaystyle  \sum_j a_j = \sum_j b_j \cdot \frac{a_j}{b_j} \geq \left( \min_i \frac{a_i}{b_i} \right) \cdot \sum_j b_j

Let {f_i(v)} be the {i}-th coordinate of the vector {f(v)}, thus {f(v) = (f_1(v),\ldots,f_m(v))}. Then we can decompose the right-hand side of (4) by coordinates, and write

\displaystyle  \frac {\sum_{u,v} A_{u,v} || f(u) - f(v) ||_1 } {\sum_{u,v} || f(u) - f(v) ||_1 }

\displaystyle  = \frac {\sum_i \sum_{u,v} A_{u,v} | f_i(u) - f_i(v) | } {\sum_i \sum_{u,v} | f_i(u) - f_i(v) | }

\displaystyle  \geq \min_i \frac {\sum_{u,v} A_{u,v} | f_i(u) - f_i(v) | } { \sum_{u,v} | f_i(u) - f_i(v) | }

This already shows that, in the definition of {\phi'}, we can map, with no loss of generality, to 1-dimensional {\ell_1} spaces.

Let {i^*} be the coordinate that achieves the minimum above. Because the cost function is invariant under the shifts and scalings (that is, the cost of a function {x\rightarrow f(x)} is the same as the cost of {x\rightarrow af(x) + b} for every two constants {a\neq 0} and {b}) there is a function {g:V\rightarrow {\mathbb R}} such that {g} has the same cost function as {f_{i*}} and it has a unit-length range {\max_v g(v) -\min_v g(v) = 1}.

Let us now pick a threshold {t} uniformly at random from the interval {[\min_v g(v) , \max_v g(v)]}, and define the random variables

\displaystyle  S_t:= \{ v: g(v) \leq t

We observe that for every pairs of vertices {u,v} we have

\displaystyle  \mathop{\mathbb E} |1_{S_t(}u) - 1_{S_t}(v) | = |g(u) - g(v)|

and so we get

\displaystyle  \frac {\sum_{u,v} A_{u,v} || f(u) - f(v) ||_1 } {\sum_{u,v} || f(u) - f(v) ||_1 }

\displaystyle  \geq \frac {\sum_{u,v} A_{u,v} | g(u) - g(v) | } { \sum_{u,v} | g(u) - g(v) | }

\displaystyle  = \frac {\mathop{\mathbb E} \sum_{u,v} A_{u,v} |1_{S_t}(u) - 1_{S_t}(v) |} {\mathop{\mathbb E}\sum_{u,v} |1_{S_t} (u)- 1_{S_t}(v)| }

Finally, by an application of (5), we see that there must be a set {S} among the possible values of {S_t } such that (4) holds. Notice that the proof was completely constructive: we simply took the coordinate {f_{i^*}} of {f} with the lowest cost function, and then the “threshold cut” given by {f_{i^*}} with the smallest sparsity. \Box

3. A Theorem of Bourgain

We will derive our main result (2) from the L1 “rounding” process of the previous section, and from the following theorem of Bourgain (the efficiency considerations are due to Linial, London and Rabinovich).

Theorem 2 (Bourgain) Let {d: V\times V \rightarrow {\mathbb R}} be a semimetric defined over a finite set {V}. Then there exists a mapping {f: V \rightarrow {\mathbb R}^m} such that, for every two elements {u,v \in R},

\displaystyle  || f(u) - f(v)||_1 \leq d(u,v) \leq ||f(u)-f(v)||_1 \cdot c\cdot \log |V|

where {c} is an absolute constant. Given {d}, the mapping {f} can be found with high probability in randomized polynomial time in {|V|}.

To see that the above theorem of Bourgain implies (2), consider a graph {G}, and let {d} be the optimal solution of the Leighton-Rao relaxation of the sparsest cut problem on {G}, and let {f:V\rightarrow {\mathbb R}} be a mapping as in Bourgain’s theorem applied to {d}. Then

\displaystyle  LR(G) = \frac{|V|^2}{|E|} \cdot \frac{\sum_{u,v} A_{u,v} d(u,v) }{\sum_{u,v} d(u,v) }

\displaystyle  \geq \frac{|V|^2}{|E|} \cdot \frac{\sum_{u,v} A_{u,v} ||f(u)-f(v) ||_1 }{c\cdot \log |V| \cdot \sum_{u,v} ||f(u)-f(v)||_1 }

\displaystyle  \geq \frac 1 {c\cdot \log |U|} \cdot \phi(G)

Bourgain’s Embedding of Any Metric into L1

In which we spend more than sixteen hundred words to explain a three-line proof.

In the last post, we left off with the following problem. We have a set V of n “vertices,” a semi-metric d: V\times V \rightarrow {\mathbb R}, and we want to find a distribution over sets A\subseteq V such that for every two vertices i,j

(1) \ \ \ \displaystyle {\mathbb E} | d(A,i) - d(A,j) | \geq \frac 1 {O(\log n)}  \cdot d(i,j)


\displaystyle d(A,i) := \min_{a\in A} d(a,i)

This will give us a way to round a solution of the Leighton-Rao relaxation to an actual cut with only an O(\log n) loss in the approximation.

Before getting to the distribution which will do the trick, it is helpful to consider a few examples.

  • Example 1: all points are at distance 1 from each other.

    Then |d(A,i)-d(A,j)| is equal to either 0 or 1, and it is 1 if and only if A contains exactly one of i or j. If A is a uniformly chosen random set, then the above condition is satisfied with probability 1/2, so we have the stronger bound

    \displaystyle {\mathbb E} | d(A,i) - d(A,j) | \geq \frac 12 \cdot d(i,j)

    [Indeed, even better, we have {\mathbb E} | d(A,i) - d(A,j) | = \frac 12 \cdot d(i,j), which is an isometric embedding.]

  • Example 2: all points are at distance either 1 or 2 from each other.

    If A contains exactly one of the vertices i,j, then |d(A,i)-d(A,j)| \geq 1, and so, if we choose A uniformly at random we have

    \displaystyle {\mathbb E} | d(A,i) - d(A,j) | \geq \frac 14 \cdot d(i,j)

These examples may trick us into thinking that a uniformly chosen random set always work, but this unfortunately is not the case.

  • Example 3: Within a set S of size n/2, all distances are 1, and the same is true within V-S; the distance between elements of S and elements of V-S is n.

    If we consider i \in S and j \in V-S, then we are in trouble whenever A contains elements from both sets, because then |d(A,i)-d(A,j)| \leq 1, while d(i,j)=n. If we pick A uniformly at random, then A will essentially always, except with exponentially small probability, contain elements from both S and V-S. If, however, we pick A to be a random set of size 1, then we are going to get |d(A,i)-d(A,j)| \geq n-2 with probability at last 1/2, which is great.

    Choosing a set of size 1, however, is a disaster inside S and inside V-S, where almost all distances |d(A,i)-d(A,j)| collapse to zero. For those pairs, however, we know that choosing A uniformly at random works well.

    The solution is thus: with probability 1/2, pick A uniformly at random; with probability 1/2, pick A of size 1.

So far we are actually getting away with {\mathbb E} |X_i - X_j| being a constant fraction of d(i,j). Here is a slightly trickier case.

  • Example 4: The shortest path metric in a \sqrt n \times \sqrt n grid.

    Take two vertices i,j at distance d. We can get, say, |d(A,i) - d(A,j)| \geq d/3, provided that A avoids all vertices at distance \leq 2d/3 from i, and it includes some vertex at distance \leq d/3 from j. In a grid, the number of vertices at distance \leq d from a given vertex is \Theta(d^2), so our goal is to pick A so that it avoids a certain set of size \Theta((2d/3)^2) = \Theta(d^2) and it hits another set of size \Theta( (d/3)^2) = \Theta(d^2). If we pick A to be a random set of size about n/d^2, both events hold with constant probability.

    Now, what works for a certain distance won’t work for a different distance, so it seems we have to do something like picking t from 1 to n, and then pick a random set A of size n/t. This is however too bad, because our chance of getting a set of the right size would only be 1/n, while we can only lose a factor 1/\log n. The solution is to pick t at random from \{ 1,2,4,\cdots,n\}, and then pick A of size A/t. With probability 1/\log n we get the right size of A, up to a factor of two.

It turns out that the last example gives a distribution that works in all cases:

  • Pick t at random in 1 \cdots \log n
  • Pick a random set A so that each i\in V is selected to be in A independently and with probability 1/2^t

Now, it would be nice to show that (as in the examples we have seen so far) for every semi-metric d and two vertices i,j, there is a size parameter t such that when A is chosen to be a random set of size n/2^t we have {\mathbb E} |d(A,i)-d(A,j)| \geq \Omega(d(i,j)).

This would mean that, after we lose a factor of \log n to “guess” the right density, we have the desired bound (3). Unfortunately this is too much to ask for; we shall instead work out an argument that uses contributions from all densities.

It is good to see one more example.

  • Example 5: A 3-regular graph of logarithmic girth.

    Let i,j be two vertices whose distance d is less than the girth. In the example of the grid, we considered all vertices at distance \leq 2d/3 from i and all vertices at distance \leq d/3 from j; in this case, there are \approx 2^{2d/3} of the former, and \approx 2^{d/3} of the latter, and it is hopeless to expect that A, no matter its density, can avoid all of the former, but hit some of the latter.

    If, however, I consider the points at distance \leq \ell+1 from i and the points at distance \leq \ell from j, they are off only by a constant factor, and there is a constant probability of avoiding the former and hitting the latter when t = \ell. So conditioned on each t between 1 and d/2, the expectation of |d(A,i) - d(A,j)| is at least \Omega(1), and, overall, the expectation is at least \Omega(d/\log n).

We are now more or less ready to tackle the general case.

We look at two vertices i,j at distance d(i,j) and we want to estimate {\mathbb E} |d(A,i)-d(A,j)|.

Let us estimate the contribution to the expectation coming from the case in which we choose a particular value of t. In such a case, there is a constant probability that

  1. A contains none of the 2^{t+1}-1 vertices closest to i, but at least one of the of 2^t vertices closest to j. [Assuming the two sets are disjoint]
  2. A contains none of the 2^{t+1}-1 vertices closest to j, but at least one of the of 2^t vertices closest to i. [Assuming the two sets are disjoint]

Notice that events (1) and (2) are disjoint, so we are allowed to sum their contributions to the expectation, without doing any double-counting.

Call ri_k the distance of the the k-th closest vertex from i, and similarly rj_k. Then, if event (1) happens, d(A,i) \geq ri_{2^{t+1}} and d(A,j) \leq rj_{2^t}, in which case

|d(A,i) - d(A,j)| \geq d(A,i) - d(A,j) \geq ri_{2^{t+1}} - rj_{2^t}

we can similarly argue that if (2) happens, then

|d(A,i) - d(A,j)| \geq d(A,j) - d(A,i) \geq rj_{2^{t+1}} - ri_{2^t}

Call r'i_{k} := \min \{ d(i,j)/3, ri_k\} and r'j_{k} := \min \{ d(i,j)/3, rj_k\}.

Let t^* be the smallest t such that either ri_{2^{t+1}}\geq d(i,j)/3
or rj_{2^{t+1}}\geq d(i,j)/3. Then for t\leq t^*-1 the events described in (1) and (2) are well-defined, and the contributions to the expectation is at least

\frac 1 {\log n} \cdot ( ri_{2^{t+1}} - rj_{2^t}  + rj_{2^{t+1}} - ri_{2^t} ) =
= \frac 1 {\log n} \cdot ( r'i_{2^{t+1}} - r'j_{2^t}  + r'j_{2^{t+1}} - r'i_{2^t} )

When t=t^*, we can verify that the contribution to the expectation is at least

= \frac 1 {\log n} \cdot ( r'i_{2^{t^*+1}} - r'j_{2^{t^*}}  + r'j_{2^{t^*+1}} - r'i_{2^{t^*}} )
\geq \frac 1 {\log n} \cdot ( d(i,j)/3  - r'j_{2^{t^*}} - r'i_{2^{t^*}}

And if we sum the contributions for t=0,\cdots t^*, the sum telescopes and we are left with

\displaystyle {\mathbb E} |d(A,i) - d(A,j) | \geq \Omega\left( \frac 1 {\log n} \right) \cdot \left( \frac{d(i,j)}{3} - r'i_{1} - r'j_1 \right)

where r'i_1 = r'j_1 = 0.

At long last, we have completed the proof.

Notice that the O(\log n) factor we have lost is best possible in light of the expander example we saw in the previous post. In many examples, however, we lost only a constant factor. It is a great open question whether it is possible to lose only a constant factor whenever the metric is a shortest-path metric on a planar graph.

How Good is the Leighton-Rao Relaxation?

Continuing our seemingly never-ending series on approximations to edge expansion, we come to the question of how good is the Leighton-Rao relaxation.

Given a graph G=(V,E), we are trying to approximate the sparsest cut problem (which, in turn, approximates the edge expansion problem within a factor of two) defined as

\displaystyle \phi(G) := \min_{S\subseteq V} \frac {edges(S,V-S)} {\frac 1n |S| \cdot |V-S|}

And we have seen that an equivalent characterization of sparsest cut is:

(1) \ \ \ \displaystyle \phi(G) = \min_{x\in {\mathbb R}^V}  \frac{ \sum_{i,j} A(i,j) | x_i - x_j | } {\frac 1n \sum_{i,j} |x_i - x_j | }

In the Leighton-Rao relaxation, we have

(2) \ \ \ \displaystyle LR(G) := \min_{d\in {\cal M}(V)} \frac{ \sum_{i,j} A(i,j) d(i,j) }{\frac 1n \sum_{i,j} d(i,j)}

where {\cal M}(V) is the set of all semi-metrics d: V \times V \rightarrow {\mathbb R}, that is, of all functions such that d(i,j) = d(j,i) \geq 0, d(i,i)=0, and that obey the “triangle inequality”

d(i,j) \leq d(i,k) + d(k,j)

for all i,j,k \in V.

This is clearly a relaxation of the sparsest cut problem, because for every x \in {\mathbb R}^V the “distances” |x_i - x_j| define indeed a semi-metric over V.

How bad can the relaxation be? Take a family of regular constant-degree expander graphs, that is graphs where \phi is a constant and the degree k is a constant independent of n=|V|. Define d(i,j) to be the shortest path distance between i and j in the graph. This is clearly a semi-metric, and we have

\displaystyle \sum_{i,j} A(i,j) d(i,j) = kn = O(n)

because along every edge the distance is precisely one, and

\displaystyle \sum_{i,j} d(i,j) = \Omega (n^2 \log_k n)

because, for every vertex i, at most k^t other vertices can be within
distance t, and so at least half the vertices are at distance \Omega( \log_k n).

So we have

\displaystyle LR(G) = O \left( \frac 1 {\log n} \right)

even though \phi(G) = \Omega(1), so the error in the approximation can be as bad as a factor of \log n. We shall prove that this is as bad as it gets.

Our task is then: given a feasible solution to (2), that is, a distance function defined on the vertices of G, find a feasible solution to sparsest cut whose cost is as small as possible, and no more than a O(\log n) factor off from the cost of the solution to (2) that we started from. Instead of directly looking for a cut, we shall look for a solution to (1), which is a more flexible formulation.

The general strategy will be, given d, to find a distribution X over vectors x \in {\mathbb R}^V such that for every two vertices i,j we have

(3) \ \ \ \displaystyle \Omega\left( \frac {1}{\log n} \right) \cdot d(i,j) \leq {\mathbb E} | X_i - X_j | \leq d(i,j)

Suppose that we start from a solution to (2) of cost c, that is, we have a semimetric d(\cdot,\cdot) such that

\displaystyle \sum_{i,j} A_{i,j} d(i,j) = c \cdot \frac 1n \sum_{i,j} d(i,j)

Then our distribution X is such that

\displaystyle {\mathbb E}  \sum_{i,j} A_{i,j} |X_i - X_j| \leq \sum_{i,j} A_{i,j} d(i,j)


\displaystyle \sum_{i,j} d(i,j) \leq O \left(  {\log n} \right) {\mathbb E} \sum_{i,j}|X_i - X_j|


\displaystyle {\mathbb E}  \left[ \sum_{i,j} A_{i,j} |X_i - X_j| - O(c\log n) \cdot \frac 1n \sum_{i,j} |X_i - X_j| \right] \leq 0

and there must exist a vector x in the support of the distribution X such that

\displaystyle \sum_{i,j} A_{i,j} |x_i - x_j| - O(c\log n) \cdot \frac 1n \sum_{i,j} |x_i - x_j|  \leq 0

and now we are done because the cost of x in (1) is at most O(\log n) times the optimum of the Leighton-Rao relaxation.

Though this works, it seems an overkill to require condition (3) for all pairs of vertices. It seems sufficient to require

{\mathbb E} |X_i-X_j| \leq d(i,j)

only for edges, and to require

d(i,j) \leq O(\log n) \cdot  {\mathbb E} |X_i-X_j|

only “on average” over all pairs of vertices. It can be shown, however, that if one wants to round a generalization of the Leighton-Rao relaxation to sparsest cut with “non-uniform demands,” then any rounding algorithm can be turned into a rounding algorithm that satisfies (3).

So how do we start from an arbitrary metric and come up with a probabilistic linear order of the vertices so that their distances in the original metric are well approximated in the linear ordering? We will describe a method of Bourgain, whose applicability in this setting is due Linial, London and Rabinovich.

The starting point is to see that if A is a set of vertices, and we define

\displaystyle X_i := d(A,i) :=  \min_{a\in A} d(a,i)

then, no matter how we choose A, we always have

|X_i - X_j | = |d(A,i) - d(A,j) | \leq d(i,j)

[Hint: use the triangle inequality to see that d(A,j) \leq d(A,i) + d(i,j).]

This means that “all we have to do” is to find a distribution over sets A such that for every i,j we have

(3) \ \ \ \displaystyle {\mathbb E} | d(A,i) - d(A,j) | \geq \Omega \left( \frac 1 {\log n} \right) \cdot d(i,j)

If you want to make it sound more high-brow, you may call such a distribution a Frechet embedding of d into \ell_1. Constructing such a distribution will be the subject of the next post.