Notes on expanders, sparsest cut, and spectral graph theory

While preparing for the spectral graph theory boot camp, which starts this Tuesday at the Simons Institute, I collected the lecture notes of my class on graph partitioning and expanders in one file.

There are no references and, most likely, plenty of errors. If you use the notes and find mistakes, please let me know by either emailing luca at berkeley or leaving a comment here.

Advertisements

CS359G Lecture 16: Constructions of Expanders

In which we give an explicit construction of expander graphs of polylogarithmic degree, state the properties of the zig-zag product of graphs, and provide an explicit construction of a family of constant-degree expanders using the zig-zag product and the polylogarithmic-degree construction.

A family of expanders is a family of graphs {G_n = (V_n,E_n)}, {|V_n|=n}, such that each graph is {d_n}-regular, and the edge-expansion of each graph is at least {h}, for an absolute constant {h} independent of {n}. Ideally, we would like to have such a construction for each {n}, although it is usually enough for most applications that, for some constant {c} and every {k}, there is an {n} for which the construction applies in the interval {\{ k, k+1, \ldots, ck \}}, or even the interval {\{ k, \ldots, ck^c\}}. We would also like the degree {d_n} to be slowly growing in {n} and, ideally, to be bounded above by an explicit constant. Today we will see a simple construction in which {d_n = O(\log^2 n)} and a more complicated one in which {d_n = O(1)}.

An explicit construction of a family of expanders is a construction in which {G_n} is “efficiently computable” given {n}. The weakest sense in which a construction is said to be explicit is when, given {n}, the (adjacency matrix of the) graph {G_n} can be constructed in time polynomial in {n}. A stronger requirement, which is necessary for several applications, is that given {n} and {i\in \{ 1,\ldots,n\}}, the list of neighbors of the {i}-th vertex of {G_n} can be computed in time polynomial in {\log n}.

In many explicit constructions of constant-degree expanders, the construction is extremely simple, and besides satisfying the stricter definition of “explicit” above, it is also such that the adjacency list of a vertex is given by a “closed-form formula.” The analysis of such constructions, however, usually requires very sophisticated mathematical tools.

Example 1 Let {p} be a prime, and define the graph {G_p = (V_p,E_p)} in which {V_p = \{ 0,\ldots,p-1\}}, and, for {a\in V_p - \{ 0\}}, the vertex {a} is connected to {a+1 \bmod p}, to {a-1 \bmod p} and to its multiplicative inverse {a^{-1} \bmod p}. The vertex {0} is connected to {1}, to {p-1}, and has a self-loop. Counting self-loops, the graph is 3-regular: it is the union of a cycle over {V_p} and of a matching over the {p-3} vertices {V_p - \{ 0,1,p-1 \}}; the vertices {0}, {1}, {p-1} have a self-loop each. There is a constant {h>0} such that, for each {p}, the graph {G_p} has edge expansion at least {h}. Unfortunately, no elementary proof of this fact is known. The graph {G_{59}} is shown in the picture below.

Constructions based on the zig-zag graph product, which we shall see next, are more complicated to describe, but much simpler to analyze.

We begin by describing a building block in the construction, which is also an independently interesting construction: a family of expanders with polylogarithmic degree, which have both a very simple description and a very simple analysis.

Continue reading

CS359G Lecture 14: ARV Rounding

In which we begin to discuss the Arora-Rao-Vazirani rounding procedure.

Recall that, in a graph {G=(V,E)} with adjacency matrix {A}, then ARV relaxation of the sparsest cut problem is the following semidefinite program.

\displaystyle  \begin{array}{lll} {\rm minimize} & \frac 1 {2|E|} \sum_{u,v} A_{u,v} || {\bf x}_u - {\bf x}_v ||^2\\ {\rm subject\ to}\\ & \sum_{u,v} || {\bf x}_u - {\bf x}_v ||^2 = {|V|^2}\\ & || {\bf x}_u - {\bf x}_v||^2 \leq || {\bf x}_u - {\bf x}_w||^2 + || {\bf x}_w - {\bf x}_v||^2 & \forall u,v,w \in V\\ & \mbox{} {\bf x}_u \in {\mathbb R}^d & \forall u\in V \end{array}

If we denote by {ARV(G)} the optimum of the relaxation, then we claimed that

\displaystyle  ARV(G) \leq \phi(G) \leq O(\sqrt{\log |V|}) \cdot ARV(G)

where the first inequality follows from the fact that {ARV(G)} is a relaxation of {\phi(G)}, and the second inequality is the result whose proof we begin to discuss today.

Continue reading

CS359G Lecture 9: Bourgain’s Embedding

In which we prove that every metric can be embedded into L1 with logarithmic distortion.

Today we prove the following theorem.

Theorem 1 (Bourgain) Let {d: V\times V \rightarrow {\mathbb R}} be a semimetric defined over a finite set {V}. Then there exists a mapping {F: V \rightarrow {\mathbb R}^m} such that, for every two elements {u,v \in R},

\displaystyle  || F(u) - F(v)||_1 \leq d(u,v) \leq ||F(u)-F(v)||_1 \cdot c\cdot \log |V|

where {c} is an absolute constant. Given {d}, the mapping {F} can be found with high probability in randomized polynomial time in {|V|}.

Together with the results that we proved in the last lecture, this implies that an optimal solution to the Leighton-Rao relaxation can be rounded to an {O(\log n)}-approximate solution to the sparsest cut problem. This was the best known approximation algorithm for sparsest cut for 15 years, until the Arora-Rao-Vazirani algorithm, which will be our next topic.

Continue reading

CS359G Lecture 8: the Leighton-Rao Relaxation

In which we introduce the Leighton-Rao relaxation of sparsest cut.

Let {G=(V,E)} be an undirected graph. Unlike past lectures, we will not need to assume that {G} is regular. We are interested in finding a sparsest cut in {G}, where the sparsity of a non-trivial bipartition {(S,V-S)} of the vertices is

\displaystyle  \phi_G(S) := \frac{\frac{1}{|E|} \cdot Edges(S,V-S) } {\frac {2}{V^2} \cdot |S| \cdot |V-S| }

which is the ratio between the fraction of edges that are cut by {(S,V-S)} and the fraction of pairs of vertices that are disconnected by the removal of those edges.

Another way to write the sparsity of a cut is as

\displaystyle  \phi_G(S) := \frac {|V|^2}{2|E|} \cdot \frac{\sum_{i,j} A_{i,j} |1_S(i) - 1_S(j)| } {\sum_{i,j} |1_S(i) - 1_S(j) | }

where {A} is the adjacency matrix of {G} and {1_S(\cdot)} is the indicator function of the set {S}.

The observation that led us to see {1-\lambda_2} as the optimum of a continuous relaxation of {\phi} was to observe that {|1_S(i)-1_S(j)| = |1_S(i) - 1_S(j)|^2}, and then relax the problem by allowing arbitrary functions {x: V \rightarrow {\mathbb R}} instead of indicator functions {1_S : V \rightarrow \{ 0,1 \}}.

The Leighton-Rao relaxation of sparsest cut is obtained using, instead, the following observation: if, for a set {S}, we define {d_S(i,j) := |1_S(i) - 1_S(j)|}, then {d_S(\cdot,\cdot)} defines a semi-metric over the set {V}, because {d_S} is symmetric, {d_S(i,i)=0}, and the triangle inequality holds. So we could think about allowing arbitrary semi-metrics in the expression for {\phi}, and define

\displaystyle  LR(G):= \min_{\begin{array}{l} d:V\times V \rightarrow {\mathbb R}\\ d \ \mbox{semi-metric} \end{array}} \frac {|V|^2}{2|E|} \cdot \frac{\sum_{i,j} A_{i,j} d(i,j) }{\sum_{i,j} d(i,j)} \ \ \ \ \ (1)

This might seem like such a broad relaxation that there could be graphs on which {LR(G)} bears no connection to {\phi(G)}. Instead, we will prove the fairly good estimate

\displaystyle   LR(G) \leq \phi(G) \leq O(\log |V|) \cdot LR(G) \ \ \ \ \ (2)

Furthermore, we will show that {LR(G)}, and an optimal solution {d(\cdot,\cdot)} can be computed in polynomial time, and the second inequality above has a constructive proof, from which we derive a polynomial time {O(\log |V|)}-approximate algorithm for sparsest cut.

1. Formulating the Leighton-Rao Relaxation as a Linear Program

The value {LR(G)} and an optimal {d(\cdot,\cdot)} can be computed in polynomial time by solving the following linear program

\begin{array}{lll}{\rm minimize} &  \sum_{i,j} A_{i,j} d_{i,j}  \\ {\rm subject\ to} \\ & \sum_{i,j} d_{i,j} = \frac {|V|^2}{2|E|}  \\ & d_{i,k} \leq d_{i,j} + d_{j,k} & \forall i,j,k \in V\\ & d_{i,j} \geq 0 & \forall i \in V\end{array} \ \ \ \ \ (3)

that has a variable {d_{i,j}} for every unordered pair of distinct vertices {i,j}. Clearly, every solution to the linear program (3) is also a solution to the right-hand side of the definition (1) of the Leighton-Rao parameter, with the same cost. Also every semi-metric can be normalized so that {\sum_{i,j} d(i,j) = |V|^2/2|E|} by multiplying every distance by a fixed constant, and the normalization does not change the value of the right-hand side of (1); after the normalization, the semimetric is a feasible solution to the linear program (3), with the same cost.

In the rest of this lecture and the next, we will show how to round a solution to (3) into a cut, achieving the logarithmic approximation promised in (2).

2. An L1 Relaxation of Sparsest Cut

In the Leighton-Rao relaxation, we relax distance functions of the form {d(i,j) = |1_S(i) - 1_S(j)|} to completely arbitrary distance functions. Let us consider an intermediate relaxation, in which we allow distance functions that can be realized by an embedding of the vertices in an {\ell_1} space.

Recall that, for a vector {{\bf x} \in {\mathbb R}^n}, its {\ell_1} norm is defined as {|| {\bf x}||_1 := \sum_i |x_i|}, and that this norm makes {{\mathbb R}^n} into a metric space with the {\ell_1} distance function

\displaystyle  || {\bf x} - {\bf y}||_1 = \sum_i |x_i - y_i |

The distance function {d(i,j) = |1_S(i) - 1_S(j)|} is an example of a distance function that can be realized by mapping each vertex to a real vector, and then defining the distance between two vertices as the {\ell_1} norm of the respective vectors. Of course it is an extremely restrictive special case, in which the dimension of the vectors is one, and in which every vertex is actually mapping to either zero or one. Let us consider the relaxation of sparsest cut to arbitrary {\ell_1} mappings, and define

\displaystyle  \phi'(G) := \inf_{m, f: V\rightarrow {\mathbb R}^m} \frac{|V|^2}{2|E|} \cdot \frac {\sum_{i,j} A_{i,j} || f(i) - f(j) ||_1 } {\sum_{i,j} || f(i) - f(j) ||_1 }

This may seem like another very broad relaxation of sparsest cut, whose optimum might bear no correlation with the sparsest cut optimum. The following theorem shows that this is not the case.

Theorem 1 For every graph {G}, {\phi(G) = \phi'(G)}.

Furthermore, there is a polynomial time algorithm that, given a mapping {f: V\rightarrow {\mathbb R}^m}, finds a cut {S} such that

\displaystyle  \frac {\sum_{u,v} A_{u,v} |1_S(u) - 1_S(v) |} {\sum_{u,v} |1_S(u) - 1_S(v)| }\leq \frac {\sum_{u,v} A_{u,v} || f(u) - f(v) ||_1 } {\sum_{u,v} || f(u) - f(v) ||_1 } \ \ \ \ \ (4)

Proof: We use ideas that have already come up in the proof the difficult direction of Cheeger’s inequality. First, we note that for every nonnegative reals {a_1,\ldots,a_m} and positive reals {b_1,\ldots,b_m} we have

\displaystyle   \frac {a_1 + \cdots a_m}{b_1 + \cdots b_m} \geq \min_i \frac{a_i}{b_i} \ \ \ \ \ (5)

as can be seen by noting that

\displaystyle  \sum_j a_j = \sum_j b_j \cdot \frac{a_j}{b_j} \geq \left( \min_i \frac{a_i}{b_i} \right) \cdot \sum_j b_j

Let {f_i(v)} be the {i}-th coordinate of the vector {f(v)}, thus {f(v) = (f_1(v),\ldots,f_m(v))}. Then we can decompose the right-hand side of (4) by coordinates, and write

\displaystyle  \frac {\sum_{u,v} A_{u,v} || f(u) - f(v) ||_1 } {\sum_{u,v} || f(u) - f(v) ||_1 }

\displaystyle  = \frac {\sum_i \sum_{u,v} A_{u,v} | f_i(u) - f_i(v) | } {\sum_i \sum_{u,v} | f_i(u) - f_i(v) | }

\displaystyle  \geq \min_i \frac {\sum_{u,v} A_{u,v} | f_i(u) - f_i(v) | } { \sum_{u,v} | f_i(u) - f_i(v) | }

This already shows that, in the definition of {\phi'}, we can map, with no loss of generality, to 1-dimensional {\ell_1} spaces.

Let {i^*} be the coordinate that achieves the minimum above. Because the cost function is invariant under the shifts and scalings (that is, the cost of a function {x\rightarrow f(x)} is the same as the cost of {x\rightarrow af(x) + b} for every two constants {a\neq 0} and {b}) there is a function {g:V\rightarrow {\mathbb R}} such that {g} has the same cost function as {f_{i*}} and it has a unit-length range {\max_v g(v) -\min_v g(v) = 1}.

Let us now pick a threshold {t} uniformly at random from the interval {[\min_v g(v) , \max_v g(v)]}, and define the random variables

\displaystyle  S_t:= \{ v: g(v) \leq t

We observe that for every pairs of vertices {u,v} we have

\displaystyle  \mathop{\mathbb E} |1_{S_t(}u) - 1_{S_t}(v) | = |g(u) - g(v)|

and so we get

\displaystyle  \frac {\sum_{u,v} A_{u,v} || f(u) - f(v) ||_1 } {\sum_{u,v} || f(u) - f(v) ||_1 }

\displaystyle  \geq \frac {\sum_{u,v} A_{u,v} | g(u) - g(v) | } { \sum_{u,v} | g(u) - g(v) | }

\displaystyle  = \frac {\mathop{\mathbb E} \sum_{u,v} A_{u,v} |1_{S_t}(u) - 1_{S_t}(v) |} {\mathop{\mathbb E}\sum_{u,v} |1_{S_t} (u)- 1_{S_t}(v)| }

Finally, by an application of (5), we see that there must be a set {S} among the possible values of {S_t } such that (4) holds. Notice that the proof was completely constructive: we simply took the coordinate {f_{i^*}} of {f} with the lowest cost function, and then the “threshold cut” given by {f_{i^*}} with the smallest sparsity. \Box

3. A Theorem of Bourgain

We will derive our main result (2) from the L1 “rounding” process of the previous section, and from the following theorem of Bourgain (the efficiency considerations are due to Linial, London and Rabinovich).

Theorem 2 (Bourgain) Let {d: V\times V \rightarrow {\mathbb R}} be a semimetric defined over a finite set {V}. Then there exists a mapping {f: V \rightarrow {\mathbb R}^m} such that, for every two elements {u,v \in R},

\displaystyle  || f(u) - f(v)||_1 \leq d(u,v) \leq ||f(u)-f(v)||_1 \cdot c\cdot \log |V|

where {c} is an absolute constant. Given {d}, the mapping {f} can be found with high probability in randomized polynomial time in {|V|}.

To see that the above theorem of Bourgain implies (2), consider a graph {G}, and let {d} be the optimal solution of the Leighton-Rao relaxation of the sparsest cut problem on {G}, and let {f:V\rightarrow {\mathbb R}} be a mapping as in Bourgain’s theorem applied to {d}. Then

\displaystyle  LR(G) = \frac{|V|^2}{|E|} \cdot \frac{\sum_{u,v} A_{u,v} d(u,v) }{\sum_{u,v} d(u,v) }

\displaystyle  \geq \frac{|V|^2}{|E|} \cdot \frac{\sum_{u,v} A_{u,v} ||f(u)-f(v) ||_1 }{c\cdot \log |V| \cdot \sum_{u,v} ||f(u)-f(v)||_1 }

\displaystyle  \geq \frac 1 {c\cdot \log |U|} \cdot \phi(G)

CS359G Lecture 7: Computing Eigenvectors

In which we analyze a nearly-linear time algorithm for finding an approximate eigenvector for the second eigenvalue of a graph adjacency matrix, to be used in the spectral partitioning algorithm.

In past lectures, we showed that, if {G=(V,E)} is a {d}-regular graph, and {M} is its normalized adjacency matrix with eigenvalues {1=\lambda_1 \geq \lambda_2 \ldots \geq \lambda_n}, given an eigenvector of {\lambda_2}, the algorithm SpectralPartition finds, in nearly-linear time {O(|E| + |V|\log |V|)}, a cut {(S,V-S)} such that {h(S) \leq 2\sqrt{h(G)}}.

More generally, if, instead of being given an eigenvector {{\bf x}} such that {M{\bf x} = \lambda_2 {\bf x}}, we are given a vector {{\bf x} \perp {\bf 1}} such that {{\bf x}^T M {\bf x} \geq (\lambda_2 - \epsilon) {\bf x}^T{\bf x}}, then the algorithm finds a cut such that {h(S) \leq \sqrt{4h(G) + 2\epsilon}}. In this lecture we describe and analyze an algorithm that computes such a vector using {O((|V|+|E|)\cdot \frac 1\epsilon \cdot \log \frac {|V|}{\epsilon})} arithmetic operations.

A symmetric matrix is positive semi-definite (abbreviated PSD) if all its eigenvalues are nonnegative. We begin by describing an algorithm that approximates the largest eigenvalue of a given symmetric PSD matrix. This might not seem to help very much because the adjacency matrix of a graph is not PSD, and because we want to compute the second largest, not the largest, eigenvalue. We will see, however, that the algorithm is easily modified to approximate the second eigenvalue of a PSD matrix (if an eigenvector of the first eigenvalue is known), and that the adjacency matrix of a graph can easily be modified to be PSD.

Continue reading