In which we spend more than sixteen hundred words to explain a three-line proof.
In the last post, we left off with the following problem. We have a set of “vertices,” a semi-metric , and we want to find a distribution over sets such that for every two vertices
where
This will give us a way to round a solution of the Leighton-Rao relaxation to an actual cut with only an loss in the approximation.
Before getting to the distribution which will do the trick, it is helpful to consider a few examples.
- Example 1: all points are at distance 1 from each other.
Then is equal to either 0 or 1, and it is 1 if and only if contains exactly one of or . If is a uniformly chosen random set, then the above condition is satisfied with probability , so we have the stronger bound
[Indeed, even better, we have , which is an isometric embedding.]
- Example 2: all points are at distance either 1 or 2 from each other.
If contains exactly one of the vertices , then , and so, if we choose uniformly at random we have
These examples may trick us into thinking that a uniformly chosen random set always work, but this unfortunately is not the case.
- Example 3: Within a set of size , all distances are 1, and the same is true within ; the distance between elements of and elements of is .
If we consider and , then we are in trouble whenever contains elements from both sets, because then , while . If we pick uniformly at random, then will essentially always, except with exponentially small probability, contain elements from both and . If, however, we pick to be a random set of size 1, then we are going to get with probability at last , which is great.
Choosing a set of size 1, however, is a disaster inside and inside , where almost all distances collapse to zero. For those pairs, however, we know that choosing uniformly at random works well.
The solution is thus: with probability 1/2, pick uniformly at random; with probability 1/2, pick of size 1.
So far we are actually getting away with being a constant fraction of . Here is a slightly trickier case.
- Example 4: The shortest path metric in a grid.
Take two vertices at distance . We can get, say, , provided that avoids all vertices at distance from , and it includes some vertex at distance from . In a grid, the number of vertices at distance from a given vertex is , so our goal is to pick so that it avoids a certain set of size and it hits another set of size . If we pick to be a random set of size about , both events hold with constant probability.
Now, what works for a certain distance won’t work for a different distance, so it seems we have to do something like picking from to , and then pick a random set of size . This is however too bad, because our chance of getting a set of the right size would only be , while we can only lose a factor . The solution is to pick at random from , and then pick of size . With probability we get the right size of , up to a factor of two.
It turns out that the last example gives a distribution that works in all cases:
- Pick at random in
- Pick a random set so that each is selected to be in independently and with probability
Now, it would be nice to show that (as in the examples we have seen so far) for every semi-metric and two vertices , there is a size parameter such that when is chosen to be a random set of size we have .
This would mean that, after we lose a factor of to “guess” the right density, we have the desired bound (3). Unfortunately this is too much to ask for; we shall instead work out an argument that uses contributions from all densities.
It is good to see one more example.
- Example 5: A 3-regular graph of logarithmic girth.
Let be two vertices whose distance is less than the girth. In the example of the grid, we considered all vertices at distance from and all vertices at distance from ; in this case, there are of the former, and of the latter, and it is hopeless to expect that , no matter its density, can avoid all of the former, but hit some of the latter.
If, however, I consider the points at distance from and the points at distance from , they are off only by a constant factor, and there is a constant probability of avoiding the former and hitting the latter when . So conditioned on each between and , the expectation of is at least , and, overall, the expectation is at least .
We are now more or less ready to tackle the general case.
We look at two vertices at distance and we want to estimate .
Let us estimate the contribution to the expectation coming from the case in which we choose a particular value of . In such a case, there is a constant probability that
- contains none of the vertices closest to , but at least one of the of vertices closest to . [Assuming the two sets are disjoint]
- contains none of the vertices closest to , but at least one of the of vertices closest to . [Assuming the two sets are disjoint]
Notice that events (1) and (2) are disjoint, so we are allowed to sum their contributions to the expectation, without doing any double-counting.
Call the distance of the the -th closest vertex from , and similarly . Then, if event (1) happens, and , in which case
we can similarly argue that if (2) happens, then
Call and .
Let be the smallest such that either
or . Then for the events described in (1) and (2) are well-defined, and the contributions to the expectation is at least
When , we can verify that the contribution to the expectation is at least
And if we sum the contributions for , the sum telescopes and we are left with
where .
At long last, we have completed the proof.
Notice that the factor we have lost is best possible in light of the expander example we saw in the previous post. In many examples, however, we lost only a constant factor. It is a great open question whether it is possible to lose only a constant factor whenever the metric is a shortest-path metric on a planar graph.
What’s the 3 line proof? Do you need footnotes?