CS359G Lecture 8: the Leighton-Rao Relaxation

In which we introduce the Leighton-Rao relaxation of sparsest cut.

Let {G=(V,E)} be an undirected graph. Unlike past lectures, we will not need to assume that {G} is regular. We are interested in finding a sparsest cut in {G}, where the sparsity of a non-trivial bipartition {(S,V-S)} of the vertices is

\displaystyle  \phi_G(S) := \frac{\frac{1}{|E|} \cdot Edges(S,V-S) } {\frac {2}{V^2} \cdot |S| \cdot |V-S| }

which is the ratio between the fraction of edges that are cut by {(S,V-S)} and the fraction of pairs of vertices that are disconnected by the removal of those edges.

Another way to write the sparsity of a cut is as

\displaystyle  \phi_G(S) := \frac {|V|^2}{2|E|} \cdot \frac{\sum_{i,j} A_{i,j} |1_S(i) - 1_S(j)| } {\sum_{i,j} |1_S(i) - 1_S(j) | }

where {A} is the adjacency matrix of {G} and {1_S(\cdot)} is the indicator function of the set {S}.

The observation that led us to see {1-\lambda_2} as the optimum of a continuous relaxation of {\phi} was to observe that {|1_S(i)-1_S(j)| = |1_S(i) - 1_S(j)|^2}, and then relax the problem by allowing arbitrary functions {x: V \rightarrow {\mathbb R}} instead of indicator functions {1_S : V \rightarrow \{ 0,1 \}}.

The Leighton-Rao relaxation of sparsest cut is obtained using, instead, the following observation: if, for a set {S}, we define {d_S(i,j) := |1_S(i) - 1_S(j)|}, then {d_S(\cdot,\cdot)} defines a semi-metric over the set {V}, because {d_S} is symmetric, {d_S(i,i)=0}, and the triangle inequality holds. So we could think about allowing arbitrary semi-metrics in the expression for {\phi}, and define

\displaystyle  LR(G):= \min_{\begin{array}{l} d:V\times V \rightarrow {\mathbb R}\\ d \ \mbox{semi-metric} \end{array}} \frac {|V|^2}{2|E|} \cdot \frac{\sum_{i,j} A_{i,j} d(i,j) }{\sum_{i,j} d(i,j)} \ \ \ \ \ (1)

This might seem like such a broad relaxation that there could be graphs on which {LR(G)} bears no connection to {\phi(G)}. Instead, we will prove the fairly good estimate

\displaystyle   LR(G) \leq \phi(G) \leq O(\log |V|) \cdot LR(G) \ \ \ \ \ (2)

Furthermore, we will show that {LR(G)}, and an optimal solution {d(\cdot,\cdot)} can be computed in polynomial time, and the second inequality above has a constructive proof, from which we derive a polynomial time {O(\log |V|)}-approximate algorithm for sparsest cut.

1. Formulating the Leighton-Rao Relaxation as a Linear Program

The value {LR(G)} and an optimal {d(\cdot,\cdot)} can be computed in polynomial time by solving the following linear program

\begin{array}{lll}{\rm minimize} &  \sum_{i,j} A_{i,j} d_{i,j}  \\ {\rm subject\ to} \\ & \sum_{i,j} d_{i,j} = \frac {|V|^2}{2|E|}  \\ & d_{i,k} \leq d_{i,j} + d_{j,k} & \forall i,j,k \in V\\ & d_{i,j} \geq 0 & \forall i \in V\end{array} \ \ \ \ \ (3)

that has a variable {d_{i,j}} for every unordered pair of distinct vertices {i,j}. Clearly, every solution to the linear program (3) is also a solution to the right-hand side of the definition (1) of the Leighton-Rao parameter, with the same cost. Also every semi-metric can be normalized so that {\sum_{i,j} d(i,j) = |V|^2/2|E|} by multiplying every distance by a fixed constant, and the normalization does not change the value of the right-hand side of (1); after the normalization, the semimetric is a feasible solution to the linear program (3), with the same cost.

In the rest of this lecture and the next, we will show how to round a solution to (3) into a cut, achieving the logarithmic approximation promised in (2).

2. An L1 Relaxation of Sparsest Cut

In the Leighton-Rao relaxation, we relax distance functions of the form {d(i,j) = |1_S(i) - 1_S(j)|} to completely arbitrary distance functions. Let us consider an intermediate relaxation, in which we allow distance functions that can be realized by an embedding of the vertices in an {\ell_1} space.

Recall that, for a vector {{\bf x} \in {\mathbb R}^n}, its {\ell_1} norm is defined as {|| {\bf x}||_1 := \sum_i |x_i|}, and that this norm makes {{\mathbb R}^n} into a metric space with the {\ell_1} distance function

\displaystyle  || {\bf x} - {\bf y}||_1 = \sum_i |x_i - y_i |

The distance function {d(i,j) = |1_S(i) - 1_S(j)|} is an example of a distance function that can be realized by mapping each vertex to a real vector, and then defining the distance between two vertices as the {\ell_1} norm of the respective vectors. Of course it is an extremely restrictive special case, in which the dimension of the vectors is one, and in which every vertex is actually mapping to either zero or one. Let us consider the relaxation of sparsest cut to arbitrary {\ell_1} mappings, and define

\displaystyle  \phi'(G) := \inf_{m, f: V\rightarrow {\mathbb R}^m} \frac{|V|^2}{2|E|} \cdot \frac {\sum_{i,j} A_{i,j} || f(i) - f(j) ||_1 } {\sum_{i,j} || f(i) - f(j) ||_1 }

This may seem like another very broad relaxation of sparsest cut, whose optimum might bear no correlation with the sparsest cut optimum. The following theorem shows that this is not the case.

Theorem 1 For every graph {G}, {\phi(G) = \phi'(G)}.

Furthermore, there is a polynomial time algorithm that, given a mapping {f: V\rightarrow {\mathbb R}^m}, finds a cut {S} such that

\displaystyle  \frac {\sum_{u,v} A_{u,v} |1_S(u) - 1_S(v) |} {\sum_{u,v} |1_S(u) - 1_S(v)| }\leq \frac {\sum_{u,v} A_{u,v} || f(u) - f(v) ||_1 } {\sum_{u,v} || f(u) - f(v) ||_1 } \ \ \ \ \ (4)

Proof: We use ideas that have already come up in the proof the difficult direction of Cheeger’s inequality. First, we note that for every nonnegative reals {a_1,\ldots,a_m} and positive reals {b_1,\ldots,b_m} we have

\displaystyle   \frac {a_1 + \cdots a_m}{b_1 + \cdots b_m} \geq \min_i \frac{a_i}{b_i} \ \ \ \ \ (5)

as can be seen by noting that

\displaystyle  \sum_j a_j = \sum_j b_j \cdot \frac{a_j}{b_j} \geq \left( \min_i \frac{a_i}{b_i} \right) \cdot \sum_j b_j

Let {f_i(v)} be the {i}-th coordinate of the vector {f(v)}, thus {f(v) = (f_1(v),\ldots,f_m(v))}. Then we can decompose the right-hand side of (4) by coordinates, and write

\displaystyle  \frac {\sum_{u,v} A_{u,v} || f(u) - f(v) ||_1 } {\sum_{u,v} || f(u) - f(v) ||_1 }

\displaystyle  = \frac {\sum_i \sum_{u,v} A_{u,v} | f_i(u) - f_i(v) | } {\sum_i \sum_{u,v} | f_i(u) - f_i(v) | }

\displaystyle  \geq \min_i \frac {\sum_{u,v} A_{u,v} | f_i(u) - f_i(v) | } { \sum_{u,v} | f_i(u) - f_i(v) | }

This already shows that, in the definition of {\phi'}, we can map, with no loss of generality, to 1-dimensional {\ell_1} spaces.

Let {i^*} be the coordinate that achieves the minimum above. Because the cost function is invariant under the shifts and scalings (that is, the cost of a function {x\rightarrow f(x)} is the same as the cost of {x\rightarrow af(x) + b} for every two constants {a\neq 0} and {b}) there is a function {g:V\rightarrow {\mathbb R}} such that {g} has the same cost function as {f_{i*}} and it has a unit-length range {\max_v g(v) -\min_v g(v) = 1}.

Let us now pick a threshold {t} uniformly at random from the interval {[\min_v g(v) , \max_v g(v)]}, and define the random variables

\displaystyle  S_t:= \{ v: g(v) \leq t

We observe that for every pairs of vertices {u,v} we have

\displaystyle  \mathop{\mathbb E} |1_{S_t(}u) - 1_{S_t}(v) | = |g(u) - g(v)|

and so we get

\displaystyle  \frac {\sum_{u,v} A_{u,v} || f(u) - f(v) ||_1 } {\sum_{u,v} || f(u) - f(v) ||_1 }

\displaystyle  \geq \frac {\sum_{u,v} A_{u,v} | g(u) - g(v) | } { \sum_{u,v} | g(u) - g(v) | }

\displaystyle  = \frac {\mathop{\mathbb E} \sum_{u,v} A_{u,v} |1_{S_t}(u) - 1_{S_t}(v) |} {\mathop{\mathbb E}\sum_{u,v} |1_{S_t} (u)- 1_{S_t}(v)| }

Finally, by an application of (5), we see that there must be a set {S} among the possible values of {S_t } such that (4) holds. Notice that the proof was completely constructive: we simply took the coordinate {f_{i^*}} of {f} with the lowest cost function, and then the “threshold cut” given by {f_{i^*}} with the smallest sparsity. \Box

3. A Theorem of Bourgain

We will derive our main result (2) from the L1 “rounding” process of the previous section, and from the following theorem of Bourgain (the efficiency considerations are due to Linial, London and Rabinovich).

Theorem 2 (Bourgain) Let {d: V\times V \rightarrow {\mathbb R}} be a semimetric defined over a finite set {V}. Then there exists a mapping {f: V \rightarrow {\mathbb R}^m} such that, for every two elements {u,v \in R},

\displaystyle  || f(u) - f(v)||_1 \leq d(u,v) \leq ||f(u)-f(v)||_1 \cdot c\cdot \log |V|

where {c} is an absolute constant. Given {d}, the mapping {f} can be found with high probability in randomized polynomial time in {|V|}.

To see that the above theorem of Bourgain implies (2), consider a graph {G}, and let {d} be the optimal solution of the Leighton-Rao relaxation of the sparsest cut problem on {G}, and let {f:V\rightarrow {\mathbb R}} be a mapping as in Bourgain’s theorem applied to {d}. Then

\displaystyle  LR(G) = \frac{|V|^2}{|E|} \cdot \frac{\sum_{u,v} A_{u,v} d(u,v) }{\sum_{u,v} d(u,v) }

\displaystyle  \geq \frac{|V|^2}{|E|} \cdot \frac{\sum_{u,v} A_{u,v} ||f(u)-f(v) ||_1 }{c\cdot \log |V| \cdot \sum_{u,v} ||f(u)-f(v)||_1 }

\displaystyle  \geq \frac 1 {c\cdot \log |U|} \cdot \phi(G)

CS359G Lecture 1: Overview

In which we describe what this course is about.

1. Overview

This class is about the following topics:

  1. Approximation algorithms for graph partitioning problems. We will study approximation algorithms for the sparsest cut problem, in which one wants to find a cut (a partition into two sets) of the vertex set of a given graph so that a minimal number of edges cross the cut compared to the number of pairs of vertices that are disconnected by the removal of such edges.

    This problem is related to estimating the edge expansion of a graph and to find balanced separators, that is, ways to disconnect a constant fraction of the pairs of vertices in a graph after removing a minimal number of edges.

    Finding balanced separators and sparse cuts arises in clustering problems, in which the presence of an edge denotes a relation of similarity, and one wants to partition vertices into few clusters so that, for the most part, vertices in the same cluster are similar and vertices in different clusters are not. For example, sparse cut approximation algorithms are used for image segmentation, by reducing the image segmentation problem to a graph clustering problem in which the vertices are the pixels of the image and the (weights of the) edges represent similarities between nearby pixels.

    Balanced separators are also useful in the design of divide-and-conquer algorithms for graph problems, in which one finds a small set of edges that disconnects the graph, recursively solves the problem on the connected components, and then patches the partial solutions and the edges of the cut, via either exact methods (usually dynamic programming) or approximate heuristic. The sparsity of the cut determines the running time of the exact algorithms and the quality of approximation of the heuristic ones.

    We will study three approximation algorithms:

    1. The Spectral Partitioning Algorithm, based on linear algebra;
    2. The Leighton-Rao Algorithm, based on a linear programming relaxation;
    3. The Arora-Rao-Vazirani Algorithm, based on a semidefinite programming relaxation.

    The three approaches are related, because the continuous optimization problem that underlies the Spectral Partitioning algorithm is a relaxation of the ARV semidefinite programming relaxation, and so is the Leighton-Rao relaxation. Rounding the Leighton-Rao and the Arora-Rao-Vazirani relaxations raise interesting problems in metric geometry, some of which are still open.

  2. Explicit Constructions of Bounded-Degree Expanders. Expander graphs are graphs with very strong connectivity and “pseudorandomness” properties. Constructions of constant-degree expanders are useful in a variety of applications, from the design of data structures, to the derandomization of algorithms, from efficient cryptographic constructions to being building blocks of more complex quasirandom objects.

    There are two families of approaches to the explicit (efficient) construction of bounded-degree expanders. One is via algebraic constructions, typically ones in which the expander is constructed as a Cayley graph of a finite group. Usually these constructions are easy to describe but rather difficult to analyze. The study of such expanders, and of the related group properties, has become a very active research program, involving mostly ergodic theorists and number theorists. There are also combinatorial constructions, which are somewhat more complicated to describe but considerably simpler to analyze.

  3. Bounding the Mixing Time of Random Walks and Approximate Counting and Sampling. If one takes a random walk in a regular graph that is connected and not bipartite, then, regardless of the starting vertex, the distribution of the {t}-th step of the walk is close to the uniform distribution over the vertices, provided that {t} is large enough. It is always sufficient for {t} to be quadratic in the number of vertices; in some graphs, however, the distribution is near-uniform even when {t} is just poly-logarithmic. Among other applications, the study of the “mixing time” (the time that it takes to reach the uniform distribution) of random walks has applications to analyzing the convergence time of certain randomized algorithms.

    The design of approximation algorithms for combinatorial counting problems, in which one wants to count the number of solutions to a given NP-type problem, can be reduced to the design of approximately uniform sampling in which one wants to approximately sample from the set of such solutions. For example, the task of approximately counting the number of perfect matchings can be reduced to the task of sampling almost uniformly from the set of perfect matchings of a given graph. One can design approximate sampling algorithms by starting from an arbitrary solution and then making a series of random local changes. The behavior of the algorithm then corresponds to performing a random walk in the graph that has a vertex for every possible solution and an edge for each local change that the algorithm can choose to make. Although the graph can have an exponential number of vertices in the size of the problem that we want to solve, it is possible for the approximate sampling algorithm to run in polynomial time, provided that a random walk in the graph converges to uniform in time poly-logarithmic in its size.

    The study of the mixing time of random walks in graphs is thus a main analysis tool to bound the running time of approximate sampling algorithms (and, via reductions, of approximate counting algorithms).

These three research programs are pursued by largely disjoint communities, but share the same mathematical core.

Continue reading