In which we spend more than sixteen hundred words to explain a three-line proof.

In the last post, we left off with the following problem. We have a set V of n “vertices,” a semi-metric d: V\times V \rightarrow {\mathbb R}, and we want to find a distribution over sets A\subseteq V such that for every two vertices i,j

(1) \ \ \ \displaystyle {\mathbb E} | d(A,i) - d(A,j) | \geq \frac 1 {O(\log n)}  \cdot d(i,j)


\displaystyle d(A,i) := \min_{a\in A} d(a,i)

This will give us a way to round a solution of the Leighton-Rao relaxation to an actual cut with only an O(\log n) loss in the approximation.

Before getting to the distribution which will do the trick, it is helpful to consider a few examples.

  • Example 1: all points are at distance 1 from each other.

    Then |d(A,i)-d(A,j)| is equal to either 0 or 1, and it is 1 if and only if A contains exactly one of i or j. If A is a uniformly chosen random set, then the above condition is satisfied with probability 1/2, so we have the stronger bound

    \displaystyle {\mathbb E} | d(A,i) - d(A,j) | \geq \frac 12 \cdot d(i,j)

    [Indeed, even better, we have {\mathbb E} | d(A,i) - d(A,j) | = \frac 12 \cdot d(i,j), which is an isometric embedding.]

  • Example 2: all points are at distance either 1 or 2 from each other.

    If A contains exactly one of the vertices i,j, then |d(A,i)-d(A,j)| \geq 1, and so, if we choose A uniformly at random we have

    \displaystyle {\mathbb E} | d(A,i) - d(A,j) | \geq \frac 14 \cdot d(i,j)

These examples may trick us into thinking that a uniformly chosen random set always work, but this unfortunately is not the case.

  • Example 3: Within a set S of size n/2, all distances are 1, and the same is true within V-S; the distance between elements of S and elements of V-S is n.

    If we consider i \in S and j \in V-S, then we are in trouble whenever A contains elements from both sets, because then |d(A,i)-d(A,j)| \leq 1, while d(i,j)=n. If we pick A uniformly at random, then A will essentially always, except with exponentially small probability, contain elements from both S and V-S. If, however, we pick A to be a random set of size 1, then we are going to get |d(A,i)-d(A,j)| \geq n-2 with probability at last 1/2, which is great.

    Choosing a set of size 1, however, is a disaster inside S and inside V-S, where almost all distances |d(A,i)-d(A,j)| collapse to zero. For those pairs, however, we know that choosing A uniformly at random works well.

    The solution is thus: with probability 1/2, pick A uniformly at random; with probability 1/2, pick A of size 1.

So far we are actually getting away with {\mathbb E} |X_i - X_j| being a constant fraction of d(i,j). Here is a slightly trickier case.

  • Example 4: The shortest path metric in a \sqrt n \times \sqrt n grid.

    Take two vertices i,j at distance d. We can get, say, |d(A,i) - d(A,j)| \geq d/3, provided that A avoids all vertices at distance \leq 2d/3 from i, and it includes some vertex at distance \leq d/3 from j. In a grid, the number of vertices at distance \leq d from a given vertex is \Theta(d^2), so our goal is to pick A so that it avoids a certain set of size \Theta((2d/3)^2) = \Theta(d^2) and it hits another set of size \Theta( (d/3)^2) = \Theta(d^2). If we pick A to be a random set of size about n/d^2, both events hold with constant probability.

    Now, what works for a certain distance won’t work for a different distance, so it seems we have to do something like picking t from 1 to n, and then pick a random set A of size n/t. This is however too bad, because our chance of getting a set of the right size would only be 1/n, while we can only lose a factor 1/\log n. The solution is to pick t at random from \{ 1,2,4,\cdots,n\}, and then pick A of size A/t. With probability 1/\log n we get the right size of A, up to a factor of two.

It turns out that the last example gives a distribution that works in all cases:

  • Pick t at random in 1 \cdots \log n
  • Pick a random set A so that each i\in V is selected to be in A independently and with probability 1/2^t

Now, it would be nice to show that (as in the examples we have seen so far) for every semi-metric d and two vertices i,j, there is a size parameter t such that when A is chosen to be a random set of size n/2^t we have {\mathbb E} |d(A,i)-d(A,j)| \geq \Omega(d(i,j)).

This would mean that, after we lose a factor of \log n to “guess” the right density, we have the desired bound (3). Unfortunately this is too much to ask for; we shall instead work out an argument that uses contributions from all densities.

It is good to see one more example.

  • Example 5: A 3-regular graph of logarithmic girth.

    Let i,j be two vertices whose distance d is less than the girth. In the example of the grid, we considered all vertices at distance \leq 2d/3 from i and all vertices at distance \leq d/3 from j; in this case, there are \approx 2^{2d/3} of the former, and \approx 2^{d/3} of the latter, and it is hopeless to expect that A, no matter its density, can avoid all of the former, but hit some of the latter.

    If, however, I consider the points at distance \leq \ell+1 from i and the points at distance \leq \ell from j, they are off only by a constant factor, and there is a constant probability of avoiding the former and hitting the latter when t = \ell. So conditioned on each t between 1 and d/2, the expectation of |d(A,i) - d(A,j)| is at least \Omega(1), and, overall, the expectation is at least \Omega(d/\log n).

We are now more or less ready to tackle the general case.

We look at two vertices i,j at distance d(i,j) and we want to estimate {\mathbb E} |d(A,i)-d(A,j)|.

Let us estimate the contribution to the expectation coming from the case in which we choose a particular value of t. In such a case, there is a constant probability that

  1. A contains none of the 2^{t+1}-1 vertices closest to i, but at least one of the of 2^t vertices closest to j. [Assuming the two sets are disjoint]
  2. A contains none of the 2^{t+1}-1 vertices closest to j, but at least one of the of 2^t vertices closest to i. [Assuming the two sets are disjoint]

Notice that events (1) and (2) are disjoint, so we are allowed to sum their contributions to the expectation, without doing any double-counting.

Call ri_k the distance of the the k-th closest vertex from i, and similarly rj_k. Then, if event (1) happens, d(A,i) \geq ri_{2^{t+1}} and d(A,j) \leq rj_{2^t}, in which case

|d(A,i) - d(A,j)| \geq d(A,i) - d(A,j) \geq ri_{2^{t+1}} - rj_{2^t}

we can similarly argue that if (2) happens, then

|d(A,i) - d(A,j)| \geq d(A,j) - d(A,i) \geq rj_{2^{t+1}} - ri_{2^t}

Call r'i_{k} := \min \{ d(i,j)/3, ri_k\} and r'j_{k} := \min \{ d(i,j)/3, rj_k\}.

Let t^* be the smallest t such that either ri_{2^{t+1}}\geq d(i,j)/3
or rj_{2^{t+1}}\geq d(i,j)/3. Then for t\leq t^*-1 the events described in (1) and (2) are well-defined, and the contributions to the expectation is at least

\frac 1 {\log n} \cdot ( ri_{2^{t+1}} - rj_{2^t}  + rj_{2^{t+1}} - ri_{2^t} ) =
= \frac 1 {\log n} \cdot ( r'i_{2^{t+1}} - r'j_{2^t}  + r'j_{2^{t+1}} - r'i_{2^t} )

When t=t^*, we can verify that the contribution to the expectation is at least

= \frac 1 {\log n} \cdot ( r'i_{2^{t^*+1}} - r'j_{2^{t^*}}  + r'j_{2^{t^*+1}} - r'i_{2^{t^*}} )
\geq \frac 1 {\log n} \cdot ( d(i,j)/3  - r'j_{2^{t^*}} - r'i_{2^{t^*}}

And if we sum the contributions for t=0,\cdots t^*, the sum telescopes and we are left with

\displaystyle {\mathbb E} |d(A,i) - d(A,j) | \geq \Omega\left( \frac 1 {\log n} \right) \cdot \left( \frac{d(i,j)}{3} - r'i_{1} - r'j_1 \right)

where r'i_1 = r'j_1 = 0.

At long last, we have completed the proof.

Notice that the O(\log n) factor we have lost is best possible in light of the expander example we saw in the previous post. In many examples, however, we lost only a constant factor. It is a great open question whether it is possible to lose only a constant factor whenever the metric is a shortest-path metric on a planar graph.

About these ads