You are currently browsing the tag archive for the 'Expanders' tag.

In June, I wrote about my work on using spectral partitioning to approximate Max Cut.

I have now posted a revised paper with a couple new things.

One is an improved analysis due to Moses Charikar, of the algorithm described in the June paper. Moses shows that if one starts from a graph in which the optimum cuts a 1-\epsilon fraction of edges, then the algorithm cuts at least a 1-4\sqrt \epsilon + 8\epsilon fraction of edges (and also at least half of the edges). Cutting more than a 1-\frac 2 \pi \sqrt \epsilon + o_\epsilon(1) fraction of edges is Unique-Games-hard. Optimizing the fraction

\displaystyle \frac{ \max \{ \ \frac 12  \ , \ (1-4\sqrt \epsilon + 8\epsilon) \ \} }{1-\epsilon}

we see that the approximation ratio of the algorithm is always at least .531.

The second new result answers a question raised by an anonymous commenter in June: what about Max Cut Gain? Read the rest of this entry »

In the Max Cut problem, we are given an undirected graph G=(V,E) and we want to find a partition (L,\bar L) of the set of vertices such that as many edges as possible have one endpoint in L and one endpoint in \bar L, and are hence cut by the partition.

It is easy, as recognized since the 1970s, to find a partition that cuts half of the edges and that, thus, is at least half as good as an optimal solution. No approximation better than 1/2 was known for this problem, until Goemans and Williamson famously proved that one could achieve a .878… approximation using semidefinite programming (SDP).

No other approximation algorithm achieving an approximation asymptotically better than 1/2 is known, and it seems that a fundamental difficulty is the following. Suppose we prove that a certain algorithm achieves approximation > 51\%. Then, given a graph in which the optimum is, say, < 50.4 \%, the algorithm and its analysis must provide a certificate that the optimum cut in the given graph is < 50.4/51 < 99\%, and there is no general technique to prove upper bounds to the Max Cut optimum of a general graph other than Semidefinite Programming. (And see here and here for negative results showing that large classes of linear programming relaxations are unable to give such certificates.)

Spectral techniques can prove upper bounds to Max Cut in certain cases (and can be seen as special cases of the upper bounds provided by the Goemans-Williamson relaxation).

In the simplified case in which G=(V,E) is a d-regular graph, let A be the adjacency matrix of G and \lambda_1 \geq \lambda_2 \geq \cdots \lambda_n be the eigenvalues of A; then it is easy to show that

(1) \ \ \ \displaystyle maxcut(G) \leq \frac 12  + \frac 12 \cdot \frac {|\lambda_n|}{d}

where maxcut(G) is the fraction of edges cut by an optimal solution. Unfortunately (1) does not quite have an approximate converse: there are graphs where |\lambda_n|=d but maxcut(G) = \frac 12 + o_d(1).

The following fact, however, is always true and well known:

  • |\lambda_n|=d if and only if G contains a bipartite connected component.

Is there an “approximate” version of the above statement characterizing the cases in which d-|\lambda_n| is small? Surprisingly, as far as I know the question had not been considered before.

For comparison, the starting point of the theory of edge expansion is the related fact

  • \lambda_2=d if and only if G is disconnected.

Which can be rephrased as:

  • \lambda_2=d if and only if there is a non-empty S\subseteq V, |S| \leq |V|/2 such that edges(S,V-S)=0.

Cheeger’s inequality characterizes the case in which d-\lambda_2 is small:

  • If there is a non-empty S\subseteq V, |S| \leq |V|/2 such that edges(S,V-S) \leq \epsilon \cdot d \cdot |S|, then \lambda_2 \geq d\cdot (1-2\epsilon);
  • If \lambda_2 \geq d\cdot (1-\epsilon) then there is a non-empty S\subseteq V, |S| \leq |V|/2 such that edges(S,V-S) \leq \sqrt{2 \epsilon} \cdot d \cdot |S|.

For a subset S\subseteq V, and a bipartition L,R=S-L of S, we say that an edge (i,j) fails [to be cut by] the bipartition if (i,j) is incident on S but it is not the case that one endpoint is in L and one endpoint is in R. (This means that either both endpoints are in L, or both endpoints are in R, or one endpoint is in S and one endpoint is not in S.) Then we can express the well-known fact about \lambda_n as

  • |\lambda_n|=d if and only if there is S\subseteq V and a bipartition of S with zero failed edges.

In this new paper I prove the following approximate version

  • If there is a non-empty S\subseteq V, and a bipartition of S with at most \epsilon \cdot d \cdot |S| failed edges, then |\lambda_n| \geq d\cdot (1-4\epsilon);
  • If |\lambda_n| \geq d\cdot (1-\epsilon), then there is a non-empty S\subseteq V, and a partition of S with at most \sqrt{2 \epsilon} \cdot d \cdot |S| failed edges.

The following notation makes the similarity with Cheeger’s inequality clearer. Define the edge expansion of a graph G as

\displaystyle h(G) = \min_{S\subseteq V. \ |S| \leq \frac {|V|}{2} } \frac {edges(S,V-S)} {d|S|}

Let us define the bipartiteness ratio of G as

\displaystyle \beta(G) = \min_{S, \ L \subseteq S, \ R=S-L} \frac{edges(L) + edges(R) + edges(S,V-S)}{d|S|}

that is, as the minimum ratio between failed edges of a partition of a set S over d|S|.

Then Cheeger’s inequality gives

\displaystyle \frac 12 \cdot \frac{d-\lambda_2}{d}  \leq h(G) \leq \sqrt{2 \cdot \frac{d-\lambda_2}{d} }

and our results give

\displaystyle \frac 14 \cdot \frac{d-|\lambda_n|}{d} \leq \beta(G) \leq \sqrt{2 \cdot \frac{d-|\lambda_n|}{d}}

This translates into an efficient algorithm that, given a graph G such that maxcut(G) \geq 1-\epsilon, finds a set S and a bipartition of S such that at least a 1- 4\sqrt{\epsilon} fraction of the edges incident on S are cut by the bipartition. Removing the vertices in S and continuing recursively on the residual graph yields a .50769… approximation algorithm for Max Cut. (The algorithm stops making recursive calls, and uses a random partition, when the partition of S found by the algorithm has too many failed edges.)

The paper is entirely a by-product of the ongoing series of posts on edge expansion: the question of relations between spectral techniques to max cut was asked by a commenter and the probabilistic view of the proof of Cheeger’s inequality that I wrote up in this post was very helpful in understanding the gap between \lambda_n and -d.

In which we spend more than sixteen hundred words to explain a three-line proof.

In the last post, we left off with the following problem. We have a set V of n “vertices,” a semi-metric d: V\times V \rightarrow {\mathbb R}, and we want to find a distribution over sets A\subseteq V such that for every two vertices i,j

(1) \ \ \ \displaystyle {\mathbb E} | d(A,i) - d(A,j) | \geq \frac 1 {O(\log n)}  \cdot d(i,j)

where

\displaystyle d(A,i) := \min_{a\in A} d(a,i)

This will give us a way to round a solution of the Leighton-Rao relaxation to an actual cut with only an O(\log n) loss in the approximation.

Before getting to the distribution which will do the trick, it is helpful to consider a few examples.

  • Example 1: all points are at distance 1 from each other.

    Then |d(A,i)-d(A,j)| is equal to either 0 or 1, and it is 1 if and only if A contains exactly one of i or j. If A is a uniformly chosen random set, then the above condition is satisfied with probability 1/2, so we have the stronger bound

    \displaystyle {\mathbb E} | d(A,i) - d(A,j) | \geq \frac 12 \cdot d(i,j)

    [Indeed, even better, we have {\mathbb E} | d(A,i) - d(A,j) | = \frac 12 \cdot d(i,j), which is an isometric embedding.]

  • Example 2: all points are at distance either 1 or 2 from each other.

    If A contains exactly one of the vertices i,j, then |d(A,i)-d(A,j)| \geq 1, and so, if we choose A uniformly at random we have

    \displaystyle {\mathbb E} | d(A,i) - d(A,j) | \geq \frac 14 \cdot d(i,j)

These examples may trick us into thinking that a uniformly chosen random set always work, but this unfortunately is not the case.

  • Example 3: Within a set S of size n/2, all distances are 1, and the same is true within V-S; the distance between elements of S and elements of V-S is n.

    If we consider i \in S and j \in V-S, then we are in trouble whenever A contains elements from both sets, because then |d(A,i)-d(A,j)| \leq 1, while d(i,j)=n. If we pick A uniformly at random, then A will essentially always, except with exponentially small probability, contain elements from both S and V-S. If, however, we pick A to be a random set of size 1, then we are going to get |d(A,i)-d(A,j)| \geq n-2 with probability at last 1/2, which is great.

    Choosing a set of size 1, however, is a disaster inside S and inside V-S, where almost all distances |d(A,i)-d(A,j)| collapse to zero. For those pairs, however, we know that choosing A uniformly at random works well.

    The solution is thus: with probability 1/2, pick A uniformly at random; with probability 1/2, pick A of size 1.

So far we are actually getting away with {\mathbb E} |X_i - X_j| being a constant fraction of d(i,j). Here is a slightly trickier case.

  • Example 4: The shortest path metric in a \sqrt n \times \sqrt n grid.

    Take two vertices i,j at distance d. We can get, say, |d(A,i) - d(A,j)| \geq d/3, provided that A avoids all vertices at distance \leq 2d/3 from i, and it includes some vertex at distance \leq d/3 from j. In a grid, the number of vertices at distance \leq d from a given vertex is \Theta(d^2), so our goal is to pick A so that it avoids a certain set of size \Theta((2d/3)^2) = \Theta(d^2) and it hits another set of size \Theta( (d/3)^2) = \Theta(d^2). If we pick A to be a random set of size about n/d^2, both events hold with constant probability.

    Now, what works for a certain distance won’t work for a different distance, so it seems we have to do something like picking t from 1 to n, and then pick a random set A of size n/t. This is however too bad, because our chance of getting a set of the right size would only be 1/n, while we can only lose a factor 1/\log n. The solution is to pick t at random from \{ 1,2,4,\cdots,n\}, and then pick A of size A/t. With probability 1/\log n we get the right size of A, up to a factor of two.

It turns out that the last example gives a distribution that works in all cases:

  • Pick t at random in 1 \cdots \log n
  • Pick a random set A so that each i\in V is selected to be in A independently and with probability 1/2^t

Now, it would be nice to show that (as in the examples we have seen so far) for every semi-metric d and two vertices i,j, there is a size parameter t such that when A is chosen to be a random set of size n/2^t we have {\mathbb E} |d(A,i)-d(A,j)| \geq \Omega(d(i,j)).

This would mean that, after we lose a factor of \log n to “guess” the right density, we have the desired bound (3). Unfortunately this is too much to ask for; we shall instead work out an argument that uses contributions from all densities.

It is good to see one more example.

  • Example 5: A 3-regular graph of logarithmic girth.

    Let i,j be two vertices whose distance d is less than the girth. In the example of the grid, we considered all vertices at distance \leq 2d/3 from i and all vertices at distance \leq d/3 from j; in this case, there are \approx 2^{2d/3} of the former, and \approx 2^{d/3} of the latter, and it is hopeless to expect that A, no matter its density, can avoid all of the former, but hit some of the latter.

    If, however, I consider the points at distance \leq \ell+1 from i and the points at distance \leq \ell from j, they are off only by a constant factor, and there is a constant probability of avoiding the former and hitting the latter when t = \ell. So conditioned on each t between 1 and d/2, the expectation of |d(A,i) - d(A,j)| is at least \Omega(1), and, overall, the expectation is at least \Omega(d/\log n).

We are now more or less ready to tackle the general case.

We look at two vertices i,j at distance d(i,j) and we want to estimate {\mathbb E} |d(A,i)-d(A,j)|.

Let us estimate the contribution to the expectation coming from the case in which we choose a particular value of t. In such a case, there is a constant probability that

  1. A contains none of the 2^{t+1}-1 vertices closest to i, but at least one of the of 2^t vertices closest to j. [Assuming the two sets are disjoint]
  2. A contains none of the 2^{t+1}-1 vertices closest to j, but at least one of the of 2^t vertices closest to i. [Assuming the two sets are disjoint]

Notice that events (1) and (2) are disjoint, so we are allowed to sum their contributions to the expectation, without doing any double-counting.

Call ri_k the distance of the the k-th closest vertex from i, and similarly rj_k. Then, if event (1) happens, d(A,i) \geq ri_{2^{t+1}} and d(A,j) \leq rj_{2^t}, in which case

|d(A,i) - d(A,j)| \geq d(A,i) - d(A,j) \geq ri_{2^{t+1}} - rj_{2^t}

we can similarly argue that if (2) happens, then

|d(A,i) - d(A,j)| \geq d(A,j) - d(A,i) \geq rj_{2^{t+1}} - ri_{2^t}

Call r'i_{k} := \min \{ d(i,j)/3, ri_k\} and r'j_{k} := \min \{ d(i,j)/3, rj_k\}.

Let t^* be the smallest t such that either ri_{2^{t+1}}\geq d(i,j)/3
or rj_{2^{t+1}}\geq d(i,j)/3. Then for t\leq t^*-1 the events described in (1) and (2) are well-defined, and the contributions to the expectation is at least

\frac 1 {\log n} \cdot ( ri_{2^{t+1}} - rj_{2^t}  + rj_{2^{t+1}} - ri_{2^t} ) =
= \frac 1 {\log n} \cdot ( r'i_{2^{t+1}} - r'j_{2^t}  + r'j_{2^{t+1}} - r'i_{2^t} )

When t=t^*, we can verify that the contribution to the expectation is at least

= \frac 1 {\log n} \cdot ( r'i_{2^{t^*+1}} - r'j_{2^{t^*}}  + r'j_{2^{t^*+1}} - r'i_{2^{t^*}} )
\geq \frac 1 {\log n} \cdot ( d(i,j)/3  - r'j_{2^{t^*}} - r'i_{2^{t^*}}

And if we sum the contributions for t=0,\cdots t^*, the sum telescopes and we are left with

\displaystyle {\mathbb E} |d(A,i) - d(A,j) | \geq \Omega\left( \frac 1 {\log n} \right) \cdot \left( \frac{d(i,j)}{3} - r'i_{1} - r'j_1 \right)

where r'i_1 = r'j_1 = 0.

At long last, we have completed the proof.

Notice that the O(\log n) factor we have lost is best possible in light of the expander example we saw in the previous post. In many examples, however, we lost only a constant factor. It is a great open question whether it is possible to lose only a constant factor whenever the metric is a shortest-path metric on a planar graph.

Continuing our seemingly never-ending series on approximations to edge expansion, we come to the question of how good is the Leighton-Rao relaxation.

Given a graph G=(V,E), we are trying to approximate the sparsest cut problem (which, in turn, approximates the edge expansion problem within a factor of two) defined as

\displaystyle \phi(G) := \min_{S\subseteq V} \frac {edges(S,V-S)} {\frac 1n |S| \cdot |V-S|}

And we have seen that an equivalent characterization of sparsest cut is:

(1) \ \ \ \displaystyle \phi(G) = \min_{x\in {\mathbb R}^V}  \frac{ \sum_{i,j} A(i,j) | x_i - x_j | } {\frac 1n \sum_{i,j} |x_i - x_j | }

In the Leighton-Rao relaxation, we have

(2) \ \ \ \displaystyle LR(G) := \min_{d\in {\cal M}(V)} \frac{ \sum_{i,j} A(i,j) d(i,j) }{\frac 1n \sum_{i,j} d(i,j)}

where {\cal M}(V) is the set of all semi-metrics d: V \times V \rightarrow {\mathbb R}, that is, of all functions such that d(i,j) = d(j,i) \geq 0, d(i,i)=0, and that obey the “triangle inequality”

d(i,j) \leq d(i,k) + d(k,j)

for all i,j,k \in V.

This is clearly a relaxation of the sparsest cut problem, because for every x \in {\mathbb R}^V the “distances” |x_i - x_j| define indeed a semi-metric over V.

How bad can the relaxation be? Take a family of regular constant-degree expander graphs, that is graphs where \phi is a constant and the degree k is a constant independent of n=|V|. Define d(i,j) to be the shortest path distance between i and j in the graph. This is clearly a semi-metric, and we have

\displaystyle \sum_{i,j} A(i,j) d(i,j) = kn = O(n)

because along every edge the distance is precisely one, and

\displaystyle \sum_{i,j} d(i,j) = \Omega (n^2 \log_k n)

because, for every vertex i, at most k^t other vertices can be within
distance t, and so at least half the vertices are at distance \Omega( \log_k n).

So we have

\displaystyle LR(G) = O \left( \frac 1 {\log n} \right)

even though \phi(G) = \Omega(1), so the error in the approximation can be as bad as a factor of \log n. We shall prove that this is as bad as it gets.

Our task is then: given a feasible solution to (2), that is, a distance function defined on the vertices of G, find a feasible solution to sparsest cut whose cost is as small as possible, and no more than a O(\log n) factor off from the cost of the solution to (2) that we started from. Instead of directly looking for a cut, we shall look for a solution to (1), which is a more flexible formulation.

The general strategy will be, given d, to find a distribution X over vectors x \in {\mathbb R}^V such that for every two vertices i,j we have

(3) \ \ \ \displaystyle \Omega\left( \frac {1}{\log n} \right) \cdot d(i,j) \leq {\mathbb E} | X_i - X_j | \leq d(i,j)

Suppose that we start from a solution to (2) of cost c, that is, we have a semimetric d(\cdot,\cdot) such that

\displaystyle \sum_{i,j} A_{i,j} d(i,j) = c \cdot \frac 1n \sum_{i,j} d(i,j)

Then our distribution X is such that

\displaystyle {\mathbb E}  \sum_{i,j} A_{i,j} |X_i - X_j| \leq \sum_{i,j} A_{i,j} d(i,j)

and

\displaystyle \sum_{i,j} d(i,j) \leq O \left(  {\log n} \right) {\mathbb E} \sum_{i,j}|X_i - X_j|

so

\displaystyle {\mathbb E}  \left[ \sum_{i,j} A_{i,j} |X_i - X_j| - O(c\log n) \cdot \frac 1n \sum_{i,j} |X_i - X_j| \right] \leq 0

and there must exist a vector x in the support of the distribution X such that

\displaystyle \sum_{i,j} A_{i,j} |x_i - x_j| - O(c\log n) \cdot \frac 1n \sum_{i,j} |x_i - x_j|  \leq 0

and now we are done because the cost of x in (1) is at most O(\log n) times the optimum of the Leighton-Rao relaxation.

Though this works, it seems an overkill to require condition (3) for all pairs of vertices. It seems sufficient to require

{\mathbb E} |X_i-X_j| \leq d(i,j)

only for edges, and to require

d(i,j) \leq O(\log n) \cdot  {\mathbb E} |X_i-X_j|

only “on average” over all pairs of vertices. It can be shown, however, that if one wants to round a generalization of the Leighton-Rao relaxation to sparsest cut with “non-uniform demands,” then any rounding algorithm can be turned into a rounding algorithm that satisfies (3).

So how do we start from an arbitrary metric and come up with a probabilistic linear order of the vertices so that their distances in the original metric are well approximated in the linear ordering? We will describe a method of Bourgain, whose applicability in this setting is due Linial, London and Rabinovich.

The starting point is to see that if A is a set of vertices, and we define

\displaystyle X_i := d(A,i) :=  \min_{a\in A} d(a,i)

then, no matter how we choose A, we always have

|X_i - X_j | = |d(A,i) - d(A,j) | \leq d(i,j)

[Hint: use the triangle inequality to see that d(A,j) \leq d(A,i) + d(i,j).]

This means that “all we have to do” is to find a distribution over sets A such that for every i,j we have

(3) \ \ \ \displaystyle {\mathbb E} | d(A,i) - d(A,j) | \geq \Omega \left( \frac 1 {\log n} \right) \cdot d(i,j)

If you want to make it sound more high-brow, you may call such a distribution a Frechet embedding of d into \ell_1. Constructing such a distribution will be the subject of the next post.

We continue to talk about the problem of estimating the expansion of a graph G=(V,E), focusing on the closely related sparsest cut, defined as

\displaystyle \phi(G):= \min_{S} \frac {edges(S,V-S) } { \frac 1n \cdot |S| \cdot |V-S| }

The spectral paritioning algorithm first finds a vector x \in {\mathbb R}^n minimizing

(1) \ \ \  \displaystyle \frac{ \sum_{i,j} A(i,j) |x(i)-x(j)|^2 } { \frac 1n \cdot \sum_{i,j} |x(i)-x(j)|^2}

(where A is the adjacency matrix of G) and then finds the best cut (S,V-S) where S is of the form \{ i \in V: \ x(i) \geq t \} for a threshold t.

We proved that if the quantity in (1) is \epsilon and G is d-regular, then the algorithm will find a cut of sparsity at most \sqrt{8 \epsilon d}, and that if x is the eigenvector of the second eigenvalue, then it is an optimal solution to (1), and the cost of an optimal solution to (1) is a lower bound to \phi(G). This means that the algorithm finds a cut of sparsity at most \sqrt{8 \phi(G) d}.

Can the analysis be improved? Read the rest of this entry »

[In which we prove the "difficult part" of Cheeger's inequality by analyzing a randomized rounding algorithm for a continuous relaxation of sparsest cut.]

We return to this month’s question: if G=(V,E) is a d-regular graph, how well can we approximate its edge expansion h(G) defined as

h(G) := \min_{S\subseteq V} \frac{edges(S,V-S)} {\min \{ |S|, \ |V-S| \} }

and its sparsest cut \phi(G) defined as

\phi(G) := \min_{S\subseteq V} \frac{edges(S,V-S)} { \frac 1n \cdot |S| \cdot |V-S|} =  \min_{x\in \{0,1\}^n } \frac{ \sum_{i,j} A(i,j) \cdot |x(i)-x(j)|}{\frac 1n  \sum_{i,j}  |x(i)-x(j)|},

where A is the adjacency matrix of G.

We have looked at three continuous relaxations of \phi(G), the spectral gap, the Leighton-Rao linear program, and the Arora-Rao-Vazirani semidefinite program.

As we saw, the spectral gap of G, defined as the difference between largest and second largest eigenvalue of A, can be seen as the solution to a continuous optimization problem:

d-\lambda_2 = \min_{x\in {\mathbb R}^n} \frac {\sum_{i,j} A(i,j) \cdot |x(i)-x(j)|^2}{\frac 1n  \sum_{i,j}  |x(i)-x(j)|^2}.

It follows from the definitions that

d-\lambda_2 \leq \phi \leq 2h

which is the “easy direction” of Cheeger’s inequality, and the interesting thing is that d-\lambda_2 is never much smaller, and it obeys

(1) \ \ \ d-\lambda_2 \geq \frac {h^2}{2d} \geq \frac {\phi^2}{8d},

which is the difficult part of Cheeger’s inequality. When we normalize all quantities by the degree, the inequality reads as

\frac 1 8 \cdot \left( \frac \phi d \right)^2 \leq \frac 1 2 \cdot \left( \frac h d \right)^2 \leq \frac{d-\lambda_2}{d} \leq \frac \phi d  \leq 2 \frac h d .

I have taught (1) in three courses and used it in two papers, but I had never really understood it, where I consider a mathematical proof to be understood if one can see it as a series of inevitable steps. Many steps in the proofs of (1) I had read, however, looked like magic tricks with no explanation. Finally, however, I have found a way to describe the proof that makes sense to me. (I note that everything I will say in this post will be completely obvious to the experts, but I hope some non-expert will read it and find it helpful.)

We prove (1), as usual, by showing that given any x\in {\mathbb R}^n such that

(2) \ \ \ \frac {\sum_{i,j} A(i,j) \cdot |x(i)-x(j)|^2}{\frac 1n  \sum_{i,j}  |x(i)-x(j)|^2} = \epsilon

we can find a threshold t such that the cut (S,V-S) defined by S:= \{ i: x_i \geq t \} satisfies

  (3)  \ \ \ \frac{edges(S,V-S)} { \min \{ |S|,\ |V-S| \} } \leq \sqrt{2 d \epsilon}

and

  \frac{edges(S,V-S)} { \frac 1n \cdot |S| \cdot |V-S| } \leq \sqrt{8 d \epsilon}.

This not only gives us a proof of (1), but also an algorithm for finding sparse cuts when they exist: take a vector x which is an eigenvector of the second eigenvalue (or simply a vector for which the Rayleigh quotient in (2) is small), sort the vertices i according to the value of x(i), and find the best cut among the “threshold cuts.” This is the “spectral partitioning” algorithm.

This means that proving (1) amounts to studying an algorithm that “rounds” the solution of a continuous relaxation to a combinatorial solution, and there is a standard pattern to such arguments in computer science: we describe a randomized rounding algorithm, study its average performance, and then argue that there is a fixed choice that is at least as good as an average one. Here, in particular, we would like to find a distribution T over threshold, such that if we define S:= \{ i: x(i) \geq T\} as a random variable in terms of T we have

  {\mathbb E} [ edges(S,V-S) ] \leq {\mathbb E}  [\min \{ |S|,\ |V-S| \} ] \cdot  \sqrt{2 d \epsilon}

and so, using linearity of expectation,

  {\mathbb E} [ edges(S,V-S)  - \min \{ |S|,\ |V-S| \}  \cdot  \sqrt{2 d \epsilon}] \leq 0

from which we see that there must be a threshold in our sample space such that (3) holds. I shall present a proof that explicitly follows this pattern. It would be nice if we could choose T uniformly at random in the interval \left[\min_i x(i), \max_i x(i)\right], but I don’t think it would work. (Any reader can see a counterexample? I couldn’t, but we’ll come back to it.) Instead, the following works: assuming with no loss of generality that the median of \{ x(1),\ldots,x(n) \} is zero, T can be chosen so that |T|\cdot T is distributed uniformly in the interval \left[\min_i x(i), \max_i x(i)\right]. (This means that thresholds near the median are less likely to be picked than thresholds far from the median.) This choice seems to be a magic trick in itself, voiding the point I made above, but I hope it will become natural as we unfold our argument. Read the rest of this entry »

[In which we finally get to Leighton-Rao and ARV.]

Continuing our edge expansion marathon, we are going to see more ways in which the sparsest cut problem can be relaxed to polynomial-time solvable continuous problems. As before, our goal is to understand how this implies certificates of expansion.

We are talking about the sparsest cut \phi(G) of a graph G(V,E), which is

\phi(G) := \min_{S\subseteq V} \frac{edges(S,V-S)}{\frac 1n |S|\cdot V-S|} = \min_{x\in \{0,1\}^n} \frac { \sum_{i,j} A(i,j) |x(i)-x(j)|} { \frac 1n \sum_{i,j} |x(i)-x(j)|}

We have looked for a while at the eigenvalues gap of G. If G is d-regular and the second largest eigenvalue of G is \lambda_2, then the difference between largest and second largest eigenvalue obeys

d-\lambda_2 = \min_{x\in {\mathbb R}^n} \frac { \sum_{i,j} A(i,j) |x(i)-x(j)|^2} {\frac 1n \sum_{i,j} |x(i)-x(j)|^2} = \min_{m,x_1,\ldots,x_n\in {\mathbb R}^m} \frac {  \sum_{i,j} A(i,j) ||x_-x_j||^2} {\frac 1n \sum_{i,j} ||x_i-x_j||^2}

and, noting that |x(i)-x(j)|^2 = |x(i)-x(j)| when x\in \{0,1\}^n, we saw that d-\lambda_2 is a relaxation of \phi, and \phi \geq d-\lambda_2.

Leighton and Rao derive a continuous relaxation of \phi in quite a different way. They note that when x\in \{ 0,1\}^n, then the “distance”

d(i,j) := |x(i)-x(j)|

induces a semi-metric on V, that is, d(i,i)=0, d(i,j) = d(j,i), and the triangle inequality holds

d(i,j) \leq d(i,k) + d(k,j) \ \ \ \forall i,j,k

One can then consider the relaxation

 (1) \ \ \ min_d \frac { \sum_{i,j} A(i,j) d(i,j)}  {\frac 1n \sum_{i,j} d(i,j)}

where the minimum is taken over all distance functions d: V\times V \rightarrow {\mathbb R} that define a semi-metric on V.

We can rewrite (1) as the linear program

(2) \ \ \ \begin{array}{l} \min \sum_{i,j} A(i,j) d(i,j)\\ subject\ to \\ \sum_{i,j} d(i,j)=n\\ d(i,j) \leq d(i,k) + d(k,j) \ \ \ \forall i,j,k\\ d(i,j) \geq 0 \ \ \ \forall i,j \end{array}

where we only write the triangle inequality constraints by having one variable d(i,j) for every unordered pair i,j.

The minimum of (2) gives a lower bound to the sparsest cut of G, and so every feasible solution for the dual of (2) gives a lower bound to \phi(G).

Before trying to write down the dual of (2), it is convenient to write (2) in a less efficient way. (This is akin to the fact that sometimes, when using induction, it is easier to prove a more general statement.) Instead of writing the triangle inequality for every three vertices, we write it for every arbitrary sequence of vertices.

(3) \ \ \ \begin{array}{l} \min \sum_{i,j} A(i,j) d(i,j)\\ subject\ to \\ \sum_{i,j} d(i,j)=n\\ d(i,j) \leq \sum_{(u,v) \in p} d(u,v) \ \ \ \forall i,j, \forall p \in P_{i,j} \\  d(i,j) \geq 0 \ \ \ \forall i,j\end{array}

where we denote by P_{i,j} the set of “paths” in the complete graph between i and j. (That is, P_{i,j} is the set of all sequences of vertices that start at i and end at j.)

Clearly (3) is the same as (2), in the sense that the extra inequalities present in (3) are implied by the triangle inequalities in (2). The dual of (3), however, is cleaner, and is given below

(4) \ \ \ \begin{array}{l} \max n\cdot y\\ \ y - \sum_{p\in P_{i,j}} y_p + \sum_{p: (i,j) \in p} y_p \leq A(i,j)\ \ \ \forall i,j\\ y,y_p \geq 0\end{array}

A feasible solution to (4) is thus a weighted set of paths and a parameter y. Before reasoning about the meaning of a solution, suppose that y,y_p satisfy the stronger constraints

(5) \ \ \ \begin{array}{l} \sum_{p\in P_{i,j}} y_p \geq y\\\sum_{p: (i,j) \in p} y_p \leq A(i,j) \end{array}

Then we can view the y_p : p \in P_{i,j} as defining a flow between i,j that carries at least y units of flow; furthermore, the union of all such flows uses at most a total capacity A(i,j) on each edge (i,j). We should think of it as an embedding of the complete graph scaled by y into G. I want to claim that this means that the sparsest cut of G is at least y times the sparsest cut of the complete graph, that is, at least y\cdot n.

To verify the claim, take any cut S,V-S. We know that such a cut separates |S| \cdot |V-S| pairs of vertices i,j, and that at least y units of flow are routed between each such pair, so that at least y\cdot |S| \cdot |V-S| units of flow cross the cut. Since every edge has capacity 1, we see that at least y\cdot  |S|\cdot  |V-S| edges of G must cross the cut.

It remains to bridge the difference between (4) or (5). One possibility is to show that a solution to (4) can be modified to a solution satisfying (5) with no loss in y. Another approach is to directly use (4) and a general weak duality argument to show directly that a solution satisfying (4) implies that G has at least the edge expansion of a complete graph with edge weights y.

There is, in fact, a broader principle at work, which we can verify by a similar argument: if \{ y_p\} is an assignment of weights to all possible paths (which we think of as defining a multicommodity flow), G is a graph with adjacency matrix A, H is a graph with adjacency matrix B, and we have

B(i,j) - \sum_{p\in P_{i,j}} y_p + \sum_{p: (i,j) \in p} y_p \leq A(i,j)\ \ \ \forall i,j

then we must have \phi(G) \geq \phi(H).

The fact that a feasible solution to (4) gives a lower bound to \phi(G) is a special case that applies to the case in which H is a scaled version of the complete graph.

This is a good point to pause, look back again at the bound coming from spectral considerations, and compare it with what we have here. Read the rest of this entry »

In the previous post, we saw that the sparsest cut of a graph G=(V,E) (which is within a factor of 2 of the edge expansion of the graph), is defined as

\phi := \min_{S\subseteq V} \frac{edges(S,V-S)}{\frac 1n |S|\cdot |V-S|}

and can also be written as

(1) \ \ \ \phi = \min_{x\in \{0,1\}^V} \frac{\sum_{i,j} A(i,j) | x(i)-x(j)|}{\frac 1n \sum_{i,j} |x(i)-x(j)|}

while the eigenvalue gap of G, can be written as

(2) \ \ \ d-\lambda_2 = \min_{x\in {\mathbb R}^V} \frac{\sum_{i,j} A(i,j) | x(i)-x(j)|^2}{\frac 1n \sum_{i,j} |x(i)-x(j)|^2}

and so it can be seen as a relaxation of \phi, considering that
|x(i)-x(j)|=|x(i)-x(j)|^2 when x(i),x(j) are boolean.

Let us now take a broader look at efficiently computable continuous relaxations of sparsest cut, and see that the eigenvalue gap is a semidefinite programming relaxation of sparsest cut, that by adding the triangle inequality to it we derive the Arora-Rao-Vazirani relaxation, and that by removing the semidefinite constraint from the ARV relaxation we get the Leighton-Rao relaxation. Read the rest of this entry »

In which we dwell at great length on the “easy” part of Cheeger’s inequality.

The edge expansion of a graph G=(V,E), defined as

\displaystyle  h(G):= \min_{S: |S| \leq \frac {|V|}{2} } \frac{edges(S,V-S)}{|S|}

is a measure of how “well connected” is a graph. A graph has edge expansion at least h if and only if, in order to disconnect a set of k vertices from the rest of the graph, one must remove at least hk edges. In other words, the removal of t edges from a graph of edge expansion \geq h must leave a connected component of size \geq n - t/h, where n is the number of vertices.

As readers of in theory know, graphs of large edge expansion (expander graphs) have many applications, and in this post we are going to return to the question of how one certifies that a given graph has large edge expansion.

For simplicity, we shall only talk about regular graphs. In a previous post we mentioned that if A is the adjacency matrix of a graph, there are numbers \lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n (the eigenvalues of A), not necessarily distinct, and an orthonormal basis x_1,\cdots,x_n of {\mathbf R}^n, such that for all  i

x_iA = \lambda_i x_i

And if the graph is d-regular, then \lambda_1=d,  x_1 = \frac {1}{\sqrt n} (1,\cdots,1), and for all i we have  |\lambda_i | \leq d. Finally, the numbers \lambda_i and the vectors x_i are computable in polynomial time. (This is all the linear algebra we will need.)

Given this setup, we have the lower bound

h(G) \geq \frac 12 \cdot (d-\lambda_2)  \ \ \   (1)

That is, the difference between largest and second largest eigenvalue gives a lower bound to the edge expansion of the graph.

Although (1) is very easy to prove, it is a remarkable result, in the sense that it gives a polynomial-time computable certificate of a “coNP” statement, namely, that for all cuts, at least a certain number of edges cross the cut. Since computing edge expansion is NP-complete, the inequality cannot always be tight. Eventually shall compare it with other methods to lower bound the edge expansion, and see that they are all special cases of the same general approach. To begin with, in this post we shall show that d-\lambda_2 is an efficiently computable relaxation of the edge expansion problem. (Or, rather, of the closely related sparsest cut problem.) Read the rest of this entry »

Cheegers inequality relates the expansion of a graph to its eigenvalue gap.

There are at least three contexts where this result arises and is important:
Read the rest of this entry »

a