You are currently browsing the tag archive for the 'Expanders' tag.

[In which we prove the "difficult part" of Cheeger's inequality by analyzing a randomized rounding algorithm for a continuous relaxation of sparsest cut.]

We return to this month’s question: if G=(V,E) is a d-regular graph, how well can we approximate its edge expansion h(G) defined as

h(G) := \min_{S\subseteq V} \frac{edges(S,V-S)} {\min \{ |S|, \ |V-S| \} }

and its sparsest cut \phi(G) defined as

\phi(G) := \min_{S\subseteq V} \frac{edges(S,V-S)} { \frac 1n \cdot |S| \cdot |V-S|} =  \min_{x\in \{0,1\}^n } \frac{ \sum_{i,j} A(i,j) \cdot |x(i)-x(j)|}{\frac 1n  \sum_{i,j}  |x(i)-x(j)|},

where A is the adjacency matrix of G.

We have looked at three continuous relaxations of \phi(G), the spectral gap, the Leighton-Rao linear program, and the Arora-Rao-Vazirani semidefinite program.

As we saw, the spectral gap of G, defined as the difference between largest and second largest eigenvalue of A, can be seen as the solution to a continuous optimization problem:

d-\lambda_2 = \min_{x\in {\mathbb R}^n} \frac {\sum_{i,j} A(i,j) \cdot |x(i)-x(j)|^2}{\frac 1n  \sum_{i,j}  |x(i)-x(j)|^2}.

It follows from the definitions that

d-\lambda_2 \leq \phi \leq 2h

which is the “easy direction” of Cheeger’s inequality, and the interesting thing is that d-\lambda_2 is never much smaller, and it obeys

(1) \ \ \ d-\lambda_2 \geq \frac {h^2}{2d} \geq \frac {\phi^2}{8d},

which is the difficult part of Cheeger’s inequality. When we normalize all quantities by the degree, the inequality reads as

\frac 1 8 \cdot \left( \frac \phi d \right)^2 \leq \frac 1 2 \cdot \left( \frac h d \right)^2 \leq \frac{d-\lambda_2}{d} \leq \frac \phi d  \leq 2 \frac h d .

I have taught (1) in three courses and used it in two papers, but I had never really understood it, where I consider a mathematical proof to be understood if one can see it as a series of inevitable steps. Many steps in the proofs of (1) I had read, however, looked like magic tricks with no explanation. Finally, however, I have found a way to describe the proof that makes sense to me. (I note that everything I will say in this post will be completely obvious to the experts, but I hope some non-expert will read it and find it helpful.)

We prove (1), as usual, by showing that given any x\in {\mathbb R}^n such that

(2) \ \ \ \frac {\sum_{i,j} A(i,j) \cdot |x(i)-x(j)|^2}{\frac 1n  \sum_{i,j}  |x(i)-x(j)|^2} = \epsilon

we can find a threshold t such that the cut (S,V-S) defined by S:= \{ i: x_i \geq t \} satisfies

  (3)  \ \ \ \frac{edges(S,V-S)} { \min \{ |S|,\ |V-S| \} } \leq \sqrt{2 d \epsilon}

and

  \frac{edges(S,V-S)} { \frac 1n \cdot |S| \cdot |V-S| } \leq \sqrt{8 d \epsilon}.

This not only gives us a proof of (1), but also an algorithm for finding sparse cuts when they exist: take a vector x which is an eigenvector of the second eigenvalue (or simply a vector for which the Rayleigh quotient in (2) is small), sort the vertices i according to the value of x(i), and find the best cut among the “threshold cuts.” This is the “spectral partitioning” algorithm.

This means that proving (1) amounts to studying an algorithm that “rounds” the solution of a continuous relaxation to a combinatorial solution, and there is a standard pattern to such arguments in computer science: we describe a randomized rounding algorithm, study its average performance, and then argue that there is a fixed choice that is at least as good as an average one. Here, in particular, we would like to find a distribution T over threshold, such that if we define S:= \{ i: x(i) \geq T\} as a random variable in terms of T we have

  {\mathbb E} [ edges(S,V-S) ] \leq {\mathbb E}  [\min \{ |S|,\ |V-S| \} ] \cdot  \sqrt{2 d \epsilon}

and so, using linearity of expectation,

  {\mathbb E} [ edges(S,V-S)  - \min \{ |S|,\ |V-S| \}  \cdot  \sqrt{2 d \epsilon}] \leq 0

from which we see that there must be a threshold in our sample space such that (3) holds. I shall present a proof that explicitly follows this pattern. It would be nice if we could choose T uniformly at random in the interval \left[\min_i x(i), \max_i x(i)\right], but I don’t think it would work. (Any reader can see a counterexample? I couldn’t, but we’ll come back to it.) Instead, the following works: assuming with no loss of generality that the median of \{ x(1),\ldots,x(n) \} is zero, T can be chosen so that |T|\cdot T is distributed uniformly in the interval \left[\min_i x(i), \max_i x(i)\right]. (This means that thresholds near the median are less likely to be picked than thresholds far from the median.) This choice seems to be a magic trick in itself, voiding the point I made above, but I hope it will become natural as we unfold our argument. Read the rest of this entry »

[In which we finally get to Leighton-Rao and ARV.]

Continuing our edge expansion marathon, we are going to see more ways in which the sparsest cut problem can be relaxed to polynomial-time solvable continuous problems. As before, our goal is to understand how this implies certificates of expansion.

We are talking about the sparsest cut \phi(G) of a graph G(V,E), which is

\phi(G) := \min_{S\subseteq V} \frac{edges(S,V-S)}{\frac 1n |S|\cdot V-S|} = \min_{x\in \{0,1\}^n} \frac { \sum_{i,j} A(i,j) |x(i)-x(j)|} { \frac 1n \sum_{i,j} |x(i)-x(j)|}

We have looked for a while at the eigenvalues gap of G. If G is d-regular and the second largest eigenvalue of G is \lambda_2, then the difference between largest and second largest eigenvalue obeys

d-\lambda_2 = \min_{x\in {\mathbb R}^n} \frac { \sum_{i,j} A(i,j) |x(i)-x(j)|^2} {\frac 1n \sum_{i,j} |x(i)-x(j)|^2} = \min_{m,x_1,\ldots,x_n\in {\mathbb R}^m} \frac {  \sum_{i,j} A(i,j) ||x_-x_j||^2} {\frac 1n \sum_{i,j} ||x_i-x_j||^2}

and, noting that |x(i)-x(j)|^2 = |x(i)-x(j)| when x\in \{0,1\}^n, we saw that d-\lambda_2 is a relaxation of \phi, and \phi \geq d-\lambda_2.

Leighton and Rao derive a continuous relaxation of \phi in quite a different way. They note that when x\in \{ 0,1\}^n, then the “distance”

d(i,j) := |x(i)-x(j)|

induces a semi-metric on V, that is, d(i,i)=0, d(i,j) = d(j,i), and the triangle inequality holds

d(i,j) \leq d(i,k) + d(k,j) \ \ \ \forall i,j,k

One can then consider the relaxation

 (1) \ \ \ min_d \frac { \sum_{i,j} A(i,j) d(i,j)}  {\frac 1n \sum_{i,j} d(i,j)}

where the minimum is taken over all distance functions d: V\times V \rightarrow {\mathbb R} that define a semi-metric on V.

We can rewrite (1) as the linear program

(2) \ \ \ \begin{array}{l} \min \sum_{i,j} A(i,j) d(i,j)\\ subject\ to \\ \sum_{i,j} d(i,j)=n\\ d(i,j) \leq d(i,k) + d(k,j) \ \ \ \forall i,j,k\\ d(i,j) \geq 0 \ \ \ \forall i,j \end{array}

where we only write the triangle inequality constraints by having one variable d(i,j) for every unordered pair i,j.

The minimum of (2) gives a lower bound to the sparsest cut of G, and so every feasible solution for the dual of (2) gives a lower bound to \phi(G).

Before trying to write down the dual of (2), it is convenient to write (2) in a less efficient way. (This is akin to the fact that sometimes, when using induction, it is easier to prove a more general statement.) Instead of writing the triangle inequality for every three vertices, we write it for every arbitrary sequence of vertices.

(3) \ \ \ \begin{array}{l} \min \sum_{i,j} A(i,j) d(i,j)\\ subject\ to \\ \sum_{i,j} d(i,j)=n\\ d(i,j) \leq \sum_{(u,v) \in p} d(u,v) \ \ \ \forall i,j, \forall p \in P_{i,j} \\  d(i,j) \geq 0 \ \ \ \forall i,j\end{array}

where we denote by P_{i,j} the set of “paths” in the complete graph between i and j. (That is, P_{i,j} is the set of all sequences of vertices that start at i and end at j.)

Clearly (3) is the same as (2), in the sense that the extra inequalities present in (3) are implied by the triangle inequalities in (2). The dual of (3), however, is cleaner, and is given below

(4) \ \ \ \begin{array}{l} \max n\cdot y\\ \ y - \sum_{p\in P_{i,j}} y_p + \sum_{p: (i,j) \in p} y_p \leq A(i,j)\ \ \ \forall i,j\\ y,y_p \geq 0\end{array}

A feasible solution to (4) is thus a weighted set of paths and a parameter y. Before reasoning about the meaning of a solution, suppose that y,y_p satisfy the stronger constraints

(5) \ \ \ \begin{array}{l} \sum_{p\in P_{i,j}} y_p \geq y\\\sum_{p: (i,j) \in p} y_p \leq A(i,j) \end{array}

Then we can view the y_p : p \in P_{i,j} as defining a flow between i,j that carries at least y units of flow; furthermore, the union of all such flows uses at most a total capacity A(i,j) on each edge (i,j). We should think of it as an embedding of the complete graph scaled by y into G. I want to claim that this means that the sparsest cut of G is at least y times the sparsest cut of the complete graph, that is, at least y\cdot n.

To verify the claim, take any cut S,V-S. We know that such a cut separates |S| \cdot |V-S| pairs of vertices i,j, and that at least y units of flow are routed between each such pair, so that at least y\cdot |S| \cdot |V-S| units of flow cross the cut. Since every edge has capacity 1, we see that at least y\cdot  |S|\cdot  |V-S| edges of G must cross the cut.

It remains to bridge the difference between (4) or (5). One possibility is to show that a solution to (4) can be modified to a solution satisfying (5) with no loss in y. Another approach is to directly use (4) and a general weak duality argument to show directly that a solution satisfying (4) implies that G has at least the edge expansion of a complete graph with edge weights y.

There is, in fact, a broader principle at work, which we can verify by a similar argument: if \{ y_p\} is an assignment of weights to all possible paths (which we think of as defining a multicommodity flow), G is a graph with adjacency matrix A, H is a graph with adjacency matrix B, and we have

B(i,j) - \sum_{p\in P_{i,j}} y_p + \sum_{p: (i,j) \in p} y_p \leq A(i,j)\ \ \ \forall i,j

then we must have \phi(G) \geq \phi(H).

The fact that a feasible solution to (4) gives a lower bound to \phi(G) is a special case that applies to the case in which H is a scaled version of the complete graph.

This is a good point to pause, look back again at the bound coming from spectral considerations, and compare it with what we have here. Read the rest of this entry »

In the previous post, we saw that the sparsest cut of a graph G=(V,E) (which is within a factor of 2 of the edge expansion of the graph), is defined as

\phi := \min_{S\subseteq V} \frac{edges(S,V-S)}{\frac 1n |S|\cdot |V-S|}

and can also be written as

(1) \ \ \ \phi = \min_{x\in \{0,1\}^V} \frac{\sum_{i,j} A(i,j) | x(i)-x(j)|}{\frac 1n \sum_{i,j} |x(i)-x(j)|}

while the eigenvalue gap of G, can be written as

(2) \ \ \ d-\lambda_2 = \min_{x\in {\mathbb R}^V} \frac{\sum_{i,j} A(i,j) | x(i)-x(j)|^2}{\frac 1n \sum_{i,j} |x(i)-x(j)|^2}

and so it can be seen as a relaxation of \phi, considering that
|x(i)-x(j)|=|x(i)-x(j)|^2 when x(i),x(j) are boolean.

Let us now take a broader look at efficiently computable continuous relaxations of sparsest cut, and see that the eigenvalue gap is a semidefinite programming relaxation of sparsest cut, that by adding the triangle inequality to it we derive the Arora-Rao-Vazirani relaxation, and that by removing the semidefinite constraint from the ARV relaxation we get the Leighton-Rao relaxation. Read the rest of this entry »

In which we dwell at great length on the “easy” part of Cheeger’s inequality.

The edge expansion of a graph G=(V,E), defined as

h(G):= \min_{S: |S| \leq \frac {|V|}{2} } \frac{edges(S,V-S)}{|S|}

is a measure of how “well connected” is a graph. A graph has edge expansion at least h if and only if, in order to disconnect a set of k vertices from the rest of the graph, one must remove at least hk edges. In other words, the removal of t edges from a graph of edge expansion \geq h must leave a connected component of size \geq n - t/h, where n is the number of vertices.

As readers of in theory know, graphs of large edge expansion (expander graphs) have many applications, and in this post we are going to return to the question of how one certifies that a given graph has large edge expansion.

For simplicity, we shall only talk about regular graphs. In a previous post we mentioned that if A is the adjacency matrix of a graph, there are numbers \lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n (the eigenvalues of A), not necessarily distinct, and an orthonormal basis x_1,\cdots,x_n of {\mathbf R}^n, such that for all  i

x_iA = \lambda_i x_i

And if the graph is d-regular, then \lambda_1=d,  x_1 = \frac {1}{\sqrt n} (1,\cdots,1), and for all i we have  |\lambda_i | \leq d. Finally, the numbers \lambda_i and the vectors x_i are computable in polynomial time. (This is all the linear algebra we will need.)

Given this setup, we have the lower bound

h(G) \geq \frac 12 \cdot (d-\lambda_2)  \ \ \   (1)

That is, the difference between largest and second largest eigenvalue gives a lower bound to the edge expansion of the graph.

Although (1) is very easy to prove, it is a remarkable result, in the sense that it gives a polynomial-time computable certificate of a “coNP” statement, namely, that for all cuts, at least a certain number of edges cross the cut. Since computing edge expansion is NP-complete, the inequality cannot always be tight. Eventually shall compare it with other methods to lower bound the edge expansion, and see that they are all special cases of the same general approach. To begin with, in this post we shall show that d-\lambda_2 is an efficiently computable relaxation of the edge expansion problem. (Or, rather, of the closely related sparsest cut problem.) Read the rest of this entry »

Cheegers inequality relates the expansion of a graph to its eigenvalue gap.

There are at least three contexts where this result arises and is important:
Read the rest of this entry »

In which we look at the representation theory of Abelian groups and at the discrete Fourier transform, we completely understand the eigenvalues and eigenvectors of the Cayley graph of an Abelian group, and we apply this powerful theory to prove that the cycle is a connected graph.

When we talked about group representations, we saw how to map each element of a group into a matrix in such a way that group addition becomes matrix multiplication.

In general, if $G$ is a finite group, then an n-dimensional representation of $G$ is a map

$\rho: G \rightarrow {\mathbb C}^{n\times n}$

such that $\rho$ is a group homomorphism between $G$ and the group of matrices $\rho(G)$. That is, we have that $\rho(g)$ is an invertible matrix for every $g$, $\rho(0)=I$, $\rho(a+b) = \rho(a) \times \rho(b)$ and $\rho(-g) = (\rho(g))^{-1}$.

(Scott will be delighted to know that the theory need not stop at finite matrices, and one may also look at representations mapping group elements into invertible linear transformations over an infinite-dimensional Hilbert space. To study finite groups, however, finite-dimensional representations are good enough.)

Note that $\rho$ need not be injective, and so, for every group, we can always define the 1-dimensional representation $\rho(g) = 1$ for every $g$. Perhaps unsurprisingly, this is usually called the trivial representation. The precise statement of a theorem of Frobenius that we mentioned earlier is that every non-trivial representation of $PSL(2,p)$ has dimension at least $(p-1)/2$.

In an Abelian group, however, there are plenty of 1-dimensional representations, and, in fact, all the information about the group is “encoded” in the set of its 1-dimensional representations. We will call a 1-dimensional representation a character of the group. (This equivalence holds only for finite groups, otherwise we need an extra condition.) Hence a character of a group $G$ is a mapping

$\chi: G \rightarrow {\mathbb C}$

such that $\chi(g) \neq 0$ for every $g$, $\chi(0)=1$, $\chi(a+b) = \chi(a)\cdot \chi(b)$ and $\chi(-g) = (\chi(g))^{-1}$. In a finite group $G$, we always have the property that if we add an element to itself $|G|$ times we get $0$; from this we can deduce that, if $\chi()$ is a character, then $\chi(g)$ must be a $|G|$-th root of 1, that is, it must be a number of the form $e^{2\pi x i/|G|}$ for some $x$. In particular, $|\chi(g)|=1$ for every $g$.

(For general groups, we say that $\chi$ is a character if it is a 1-dimensional representation and it has the property that $|\chi(g)|=1$ for every $g$. While the latter property is a consequence of the former for finite groups and, more generally, for torsion groups, this is not the case for other groups. For example, $\chi(x) := e^{x}$ is a 1-dimensional representation for $\mathbb Z$ but it is not a character.)

At first sight, it might look like a finite group might have an infinite number of characters (or at least that’s what I thought when I first saw the definition). There is however a simple argument that shows that a finite group $G$ cannot have more than $|G|$ characters.

Consider the set of all functions of the form $f: G \rightarrow {\mathbb C}$; we can think of it as a $|G|$-dimensional vector space over $\mathbb C$. We have that the characters of $G$ are linearly independent, and so there cannot be more than $|G|$ of them. In fact, even more is true: the characters of $G$ are orthogonal to each other. For two functions $f,h$, define their inner product as

$(f,g) := \frac1 {|G|} \sum_{a\in G} f(a) \overline{g(a)} $

Then, if $\chi$ and $\eta$ are two different characters, we have

$(\chi,\eta) = \frac1 {|G|} \sum_{a\in G} \chi(a) \overline{\eta(a)}=0 $

The normalization in the definition of inner product is convenient because then we also have $(\chi,\chi)=1$ if $\chi$ is a character. (We will not prove these claims, but the proofs are very simple.)

We will now show that the cyclic group ${\mathbb Z}_N$ has indeed $N$ distinct characters $\chi_0,\ldots,\chi_{N-1}$. This means that this set of characters forms an orthonormal basis for the set of functions $f: {\mathbb Z}_N \rightarrow {\mathbb C}$, and that each such function can be written as a linear combination

$f(x) = \sum_{c=0}^{N-1} \hat f(c) \chi_c (x)$ (*)
where the coefficients $\hat f(c)$ can be computed as the inner product
$\hat f(c) = (f,\chi_c)$

(The equation (*) is the Fourier expansion of $f$, and the function $\hat f()$ is called the Fourier transform of $f$.)

Here is how to define the characters $\chi_c$: for each $c$, define $\chi_c(x):= e^{2\pi c x /N}$. It is immediate to verify that each such function is a character, and that, for $c=0,\ldots,N-1$ we are, indeed, defining $N$ different functions. By the previously mentioned results, there is no other function.

Let’s see one more example: consider $({\mathbb Z}_2)^n$, that is, $\{0,1\}^n$ with the operation of bit-wise XOR. For each ${\mathbf a} = (a_1,\ldots,a_n)\in \{0,1\}^n$, define the function

$\chi_{\mathbf a} (x_1,\ldots,x_n) := (-1)^{\sum_i a_i x_i }$
We can again verify that all these $2^n$ functions are distinct, that each of them is a character, and so that they include all the characters of $({\mathbb Z}_2)^n$. The reader should be able to see the pattern and to reconstruct the $N_1\times \cdots N_k$ characters of the group
${\mathbb Z}_{N_1} \times \cdots \times {\mathbb Z}_{N_k}$
thus having a complete understanding of the set of characters of any given finite Abelian group.

At long last, we can get to the subject of expansion.

Let $A$ be a finite Abelian group, and $S$ be a set of generators such that if $a\in S$ then $-a\in S$. Definite the Cayley graph $G=(A,E)$ where every element of $A$ is a vertex and where $(a,b)$ is an edge if $a-b\in S$. This is cleary an $|S|$-regular undirected graph. Let $\chi_1,\ldots,\chi_{|A|}$ be the set of characters of $A$. Think of each character $\chi_i$ as an $|A|$-dimensional vector. (The vector that has the value $\chi_i(a)$ in the $a$-th coordinate.) Then these vectors are orthogonal to each other. We claim that they are also eigenvectors for the adjacency matrix of $G$.

Note a surprising fact here: the graph $G$ depends on the group $A$ and on the choice of generators $S$. The eigenvectors of the adjacency matrix, however, depend on $A$ alone. (The eigenvalues, of course, will depend on $S$.)

The proof of the claim is immediate. Let $M$ be the adjacency matrix of $G$, then the $b$-th entry of the vector $\chi_i \times M$ is

$\sum_{a\in A} \chi_i(a) M(a,b) = \sum_{s\in S} \chi_i(b+s) =$
$ \sum_{s\in S} \chi_i(b)\chi_i (s) = \chi_i (b) \sum_{s\in S} \chi_i(s)$
We are done! The vector $\chi_i$ is indeed an eigenvector, of eigenvalue $\sum_{s\in S} \chi_i(s)$. Since we have $|A|$ eigenvectors, and they are linearly independent, we have found all eigenvalues, with multiplicities.

Let’s look at the cycle with $N$ vertices. It is the Cayley graph of ${\mathbb Z}_N$ with generators $\{-1,1\}$. We have $N$ eigenvalues, one for each $c=0,\ldots,N$, and we have
$\lambda_c = \chi_c(-1) + \chi_c(1) = e^{-2\pi c i/N} + e^{-2\pi c i/N}$
this may seem to contradict our earlier claim that all eigenvalues were going to be real. But two lines of trigonometry show that, in fact,
$\lambda_c = 2 \cos (2\pi c/N)$
As expected in a 2-regular graph, the largest eigenvalue is $\lambda_0 = 2\cos(0)=2$. The second largest eigenvalue, however, is $2\cos(2\pi /N$, which is $2-\Theta(1/N^2)$. This means that the expansion of the cycle is at least $\Omega(1/N^2)$ and, in particular, the cycle is a connected graph!

(After we are done having a good laugh at the expense of algebraic graph theory, I should add that this calculation, plus some relatively easy facts, implies that a random walk on the cycle mixes in $O(N^2)$ steps. Not only this is the right bound, but it is also something that does not have a completely straightforward alternative proof.)

We do a little better with the hypercube. The hypercube is the Cayley graph of the group $({\mathbb Z}_2)^n$ with the generator set that contains all the vectors that have precisely one 1. Now we have one eigenvalue for each choice of $(a_1,\ldots,a_n)$, and the corresponding eigenvalue is
$\sum_{i=1}^n (-1)^{a_i}$.

The largest eigenvalue, when all the $a_i$ are zero, is $n$, as expected in an $n$-regular graph. The second largest eigenvalue, however, is at most $n-2$. This means that the expansion of the hypercube is at least 1. (As it happens, it is precisely 1, as shown by a dimension cut.)

Before trying to describe how one reasons about the eigenvalues of the adjacency matrix of a Cayley graph, I would like to describe the “easy direction” of the relationships between eigenvalues and expansion.

First, a one-minute summary of linear algebra. If A is an n\times n matrix, then a (column) non-zero vector x is an eigenvector of A provided that, for some scalar \lambda, we have Ax= \lambda x. If so, then \lambda is an eigenvalue of A. A couple of immediate observations:

  • The equation Ax=\lambda x can be rewritten as (A-\lambda I)x = 0, so if \lambda is an eigenvalue of A then the columns of the matrix A-\lambda I are not linearly independent, and so the determinant of A-\lambda I is zero. But we can write det(A-\lambda I) as a degree-n polynomial in the unknown \lambda, and a degree-n polynomial cannot have more than n roots.

    So, an n\times n matrix has at most n eigenvalues.

  • If \lambda is an eigenvalue of A, then the set of vectors x such that Ax=\lambda x is a linear space. If \lambda_1 and \lambda_2 are different eigenvalues, then their respective spaces of eigenvectors are independent (they only intersect at 0).

If A is a symmetric real-valued matrix, then a series of miracles (or obvious things, depending on whether you have studied this material or not) happen:

  • All the roots of the polynomial det(A-\lambda I) are real; so, counting multiplicities, A has n real eigenvalues.
  • If \lambda is an eigenvalue of A, and it is a root of multiplicity k of det(A-\lambda I), then the space of vectors x such that Ax=\lambda x has dimension k.
  • If \lambda_1 \neq \lambda_2 are two different eigenvalues, then their respective spaces of eigenvectors are orthogonal

From this fact, it is easy to conclude that starting from symmetric real matrix A we can find a sequence of n numbers

\lambda_1\geq \ldots \geq \lambda_n

and a sequence of n unit vectors v_1,\ldots,v_n such that the v_i are orthogonal to each other and such that A v_i = \lambda_i v_i.

Now the claim is that these n numbers and n vectors completely describe A. Indeed, suppose that we are given a vector x and that we want to compute Ax. First, we can write x as a linear combination of the v_i:

x= \alpha_1 v_1 + \cdots \alpha_n v_n

and since the v_i are an orthonormal basis, we know what the coefficients \alpha_i are like: each \alpha_i is just the inner product of x and v_i.

Now, by linearity, we have

Ax = \alpha_1 A v_1 + \cdots \alpha_n A v_n

and, since the v_i are eigenvectors,

= \alpha_1 \lambda_1 v_1 + \cdots + \alpha_n \lambda_n v_n

So, once we express vectors in the basis defined by the v_i, we see that multiplication by A is just the effect of multiplying coordinate i by the value \lambda_i.

But there is more. Recall that \alpha_i = v_i^T x and so we can also write

Ax = (\lambda_1 v_1 v_1^T) x + \cdots + (\lambda_n v_n v_n^T) x

and so, in particular, matrix A is just

A = \lambda_1 v_1 v_1^T + \cdots + \lambda_n v_n v_n^T

a sum of n rank-1 matrices, with the coefficients of the sum being the eigenvalues.

Let now A be the adjacency matrix of a d-regular undirected graph (so A is real-valued and symmetric), \lambda_1\geq \cdots \geq \lambda_n be its eigenvalues, and v_1,\ldots,v_n be a corresponding set of orthonormal eigenvectors.

It is not hard to see (no, really, it’s two lines, see the bottom of page 2 of these notes) that all eigenvalues are between d and -d. Also, d is always an eigenvalue, as shown by the eigenvector (1,\ldots,1). This means that \lambda_1 = d and we can take

v_1 = \left( \frac 1 {\sqrt n},\cdots, \frac 1 {\sqrt n} \right)

and we have

A = (d v_1 v_1^T) x + \cdots + (\lambda_n v_n v_n^T) x

Suppose now that all the other \lambda_i, i=2,\ldots,n are much smaller than d in absolute value. Then, intuitively, the sum is going to be dominated by the first term, and A behaves sort of like the matrix (d v_1 v_1^T), that is, the matrix that has the value \frac{d}{n} in each entry.

You may be skeptical that such an approximation makes sense, but see where this leads. If our graph is G=(V,E), and (S,V-S) is a cut of the set of vertices into two subsets, let 1_S be the n-dimensional vector that is 1 in coordinates corresponding to S and 0 in the other coordinates, and define 1_{V-S} similarly. Then the number of edges crossing the cut is precisely

1_S^T A 1_{V-S}

and if we replace A by the matrix that has \frac{d}{n} in each coordinate we get the approximation  |S|\cdot|V-S|\cdot \frac {d}{n}, which is the expected number of edges that would cross the cut if we had a random graph of degree d. This is pretty much the right result: if |\lambda_2|,\ldots, |\lambda_n| are all much smaller than d, and if |S| and |V-S| are at least a constant times n, then the number of edges crossing the cut is indeed very close to |S|\cdot |V-S| \cdot \frac {d}{n}. This result is known as the Expander Mixing Lemma, and the proof is just a matter of doing the “obvious” calculations. (Write 1_S and 1_{V-S} in the eigenvector basis, distribute the product, remember that the v_i are orthogonal, use Cauchy-Schwarz …)

It is perhaps less intuitive that, provided that all other eigenvalues are at least a little bit smaller than d, then a reasonably large number of edges crosses the cut. In particular, if |S| \leq n/2 and if \lambda_2 \leq d-\epsilon, then at least \epsilon  |S|/2 edges cross the cut. That is, if all eigenvalues except the largest one are at most d-\epsilon then the edge expansion of the graph is at least \epsilon/2. (It takes about five lines to prove this, see the beginning of the proof of Thm 1 in these notes.)

To summarize: the adjacency matrix of an undirected graph is symmetric, and so all eigenvalues are real. If the graph is d-regular, then d is the largest eigenvalue, and all others are between d and -d. If all the others are between d-\epsilon and -d, then the graph has edge-expansion \frac{\epsilon}{2} (or better).

(As will become clear soon, I am totally out of my depth: Readers who find glaring mistakes, and even small ones, should note them in the comments.)

In the previous post on expanders we left off with the observation that if we want to construct an expanding Cayley graph, then we need to start from a group where all $n$ elements of the group can be obtained by a sum of $O(log n)$ terms chosen, with repetitions, from a set of $O(1)$ generators; this means that a constant root of the $n^{O(1)}$ ways of rearranging such sums must lead to different results. The group, then, has to be non-commutative in a very strong sense.

I will describe a way in which a group used to construct expanders is “very non-commutative.” This will not be a sufficient property for the expansion, but it is closely related and it will come up in the paper of Gowers that we will talk about later.

With this objective in mind, let’s see the notion of group representation. Let’s start from the simplest group, the cyclic group of $N$ elements $Z_N$ with the operation of addition (or $Z/NZ$ as mathematicians write). One can visualize this group as $N$ points equally spaced along a circle in the plane, with element $a$ forming an angle of $2\pi a/ N$ with the $x$ axis. If we want to add an element that forms an angle $\alpha$ to an element that forms an angle $\beta$, that’s the same as putting the “two angles next to each other” and we get a point that forms an angle $\alpha+\beta$. More algebraically, we identify an element $a$ with the complex number $e^{2\pi a i/n}$ and we realize group addition as multiplication of complex numbers. There is, however, another way of visualizing the cyclic group on a cycle. We can think of group element $a$ as the operation that takes a point on the cycle and moves it by an angle of $2\pi a/N$. Addition of group elements now corresponds to “composition,” that is, corresponds to applying one rotation after the other.

In this view, element $a$ is now the function that maps complex number $x$ into complex number $x\cdot e^{2\pi a i/N}$.

Suppose now that our group is a product $Z_{N_1} \times \cdots \times Z_{N_k}$ of cyclic groups. This means that a group elements if a $k$-tuple of the form $(a_1,\ldots,a_d)$, and that

$(a_1,\ldots,a_d) + (b_1,\ldots,b_d) = (a_1+b_1 \bmod N_1,\ldots,a_k+b_k \bmod N_k)$

It now makes sense to view a group element $(a_1,\ldots,a_d)$ as a “high-dimensional rotation” operation that takes in input $k$ complex numbers $(x_1,\ldots,x_k)$ and outputs the $k$ complex numbers

$(x_1 \cdot e^{2\pi a_1 i /N_1},\ldots, x_k \cdot e^{2\pi a_k i /N_k})$

If we take this view, we have, again, the property that group addition becomes composition of functions. Note, also, that the function that we have associated to each group element is a very simple type of linear function: it is simply multiplication of $x$ times a diagonal matrix that has diagonal

$(e^{2\pi a_1 i /N_1},\ldots, e^{2\pi a_k i /N_k})$

Notice, also, that if $f$ is a function of the form $f(x) = A\cdot x$, where $A$ is a matrix, and $g$ is a function of the form $B\cdot x$ where $B$ is a matrix, then $f(g(x))= A\cdot B\cdot x$. That is, for linear functions, function composition is the same as matrix multiplication.

To summarize, we have started from the group $Z_{N_1} \times \cdots \times Z_{N_k}$, and we have found a way to associate a complex-valued diagonal matrix to each group element in such a way that group addition becomes matrix multiplication.

It is known that all finite Abelian groups can be written as $Z_{N_1} \times \cdots \times Z_{N_k}$, so this type of representation via diagonal matrices is possible for every finite Abelian group.

What about more general groups? If $G$ is an arbitrary finite group, it is possible to associate to each element a block-diagonal matrix in such a way that group addition becomes matrix multiplication.

(It is common to call a group operation “multiplication” when a group is not Abelian, but for consistency with the rest of the post I will keep calling it addition.)

(By the way, it is also possible to represent infinite groups by associating a linear operator to each group element, but we will only discuss finite groups here.)

If the group is Abelian, then the matrices are diagonal, that is, they are block-diagonal matrices with “block size one.” So one way of quantifying “how non-Abelian” is a group is to consider how small the blocks can be in such matrix representations. That’s the dimension of the representation.

Here is an example of a family of groups whose representations cannot be low-dimensional. Let $p$ be a prime (it could also be a prime power) and let us consider $2 \times 2$ matrices whose entries are integers $\bmod p$. Let us restrict ourselves to matrices whose determinant is 1 modulo $p$, and consider the operation of matrix multiplication (where, also, all operations are mod $p$). This set of matrices forms a group, because the matrices are invertible (the determinant is non-zero) and the set contains the identity matrix and is closed under product. This group is called $SL(2,p)$. The group $SL(2,p)$ contains the tiny subgroup of two elements $\{ I,-I\}$; in the representation of $SL(2,p)$ this shows up as a block of size 1. If we take the quotient of $SL(2,p)$ by $\{ I,-I \}$ then we get another group, which is called $PSL(2,p)$.

It is now a theorem of Frobenius that every representation of $PSL(2,p)$ has dimension at least $(p-1)/2$. This is really large compared to the size of $PSL(2,p)$: the group $SL(2,p)$ has $p(p-1)^2$ elements, and so $PSL(2,p)$ has $p(p-1)^2/2$ elements. The dimension of the representation is thus approximately the cube root of the number of elements of the group.

Going back to representations of Abelian groups, we see that, in that case, not only each block had size one, but also that the entire matrix had size at most logarithmic in the number of elements of the group. This shows that $SL(2,p)$ and $PSL(2,p)$ are, at least in this particular sense, “very non-Abelian,” and it is an encouraging sign that they may be useful in constructing expanders and other quasi-random objects.

On his way toward a question about subsets of groups not containing triples of the form $\{ a, b, a+b\}$, Gowers shows that every dense Cayley graph constructed on $PSL(2,p)$ has constant edge expansion. (By dense, we mean that the degree is $\Omega(n)$, where $n$ is the number of vertices.) This is a result that, assuming Frobenius theorem, has a simple proof, which I hope to understand and describe at a later point.

The celebrated Ramanujan graphs of Lubotzky, Phillips and Sarnak are constant-degree Cayley graphs constructed from $PGL(2,p)$, which is the same as $PSL(2,p)$ except that we put no restriction on the determinant. Their relationship between degree and eigenvalue gap is best possible. The analysis of Lubotzky et al. is, unfortunately, completely out of reach.

I have started to read a new paper by Gowers titled Quasi-random groups. “Quasi-random” is the more-or-less standard term used to denote a single object that has some of the typical properties of a random object; this is distinct from the term “pseudo-random” which is more often applied to a distribution that satisfies a certain set of properties with approximately the same probability as the uniform distribution.

(The distinction is sometimes artificial: for example the support of a pseudorandom distribution can also be thought of as a quasi-random set. To add confusion, in older literature on extractors certain high-entropy sources were called “quasi-random” sources.)

Although the paper does not directly talk about expander graphs, it addresses related questions, and it gives an opportunity for a post (or more than one) on expander graph constructions.

Informally, an expander graph is a graph that, while having small bounded degree, is very well-connected. For example, a \sqrt{n} \times \sqrt{n} grid can be disconnected into two equal-size connected components after removing just \sqrt{n} edges. This means that a \Omega(1) fraction of all pairs of vertices can be disconneted while only removing a o(1) fraction of all edges; for this reason, we do not think of a 2-dimensional grid as a good expander. In a graph of degree O(1) it is always possible to cut off k vertices from the rest of the graph by removing O(k) edges; a graph is an expander if this is best possible. That is, a graph is an expander if for every set of k vertices (where k is less than half of the total number of vertices) you need to remove at least \Omega(k) edges to disconnect those vertices from the rest of the graph.

More quantitatively, a d-regular graph has edge expansion at least \epsilon if for every set S of vertices there are at least \epsilon \cdot |S| edges going between vertices in S and vertices not in S. (Provided |S| is less than half the total number of vertices.)

We have a good construction of expanders if for fixed constants \epsilon and d we are able to construct \epsilon-expanders of degree d with any desired number of vertices.

The applications of expander graphs are vast, they are useful to reduce randomness in probabilistic algorithms, they come up in certain constructions of data structures, of randomness extractors, and of error-correcting codes, and they have many applications in complexity theory. Most recently, Dinur’s new proof of the PCP Theorem uses expanders. A great set of lecture notes on expanders is here, and not so great one is here (lectures 8-12 are on expanders).

Before the great zig-zag graph product revolution, almost all the good expander graph constructions were Cayley graphs of non-Abelian groups. Let’s see what this means and how it helps.

If G is a group and D\subset G is a subset, then G and D define a |D|-regular graph with |G| vertices as follows: every element of G is a vertex, and vertices u and v are connected if u-v \in D. If we assume that, for every a\in D we also have -a \in D, then we can think of the graph as being undirected.

A first trivial observation is that if D is a set of generators, that is, if we can realize every element of G as a sum of elements of D, then the graph is connected, otherwise it is disconnected.

Another point, which is true for all graphs, not just Cayley graphs, is that we can approximate the expansion of a graph by computing the eigenvalues of the adjacency matrix and looking at the difference between the largest and the second largest in absolute value. This was a point that I used to find mystifying, until I studied it and realized that it makes perfect sense. I may talk about it in another post, but I want to mention an amusing fact. The connection between eigenvalues and expansion has an easy direction (if there is a large eigenvalue gap then there is large expansion) and a difficult direction (if there large expansion then there must be large eigenvalue gap). The easy direction was “well known,” and the difficult direction was discovered around the same time in two distinct communities for opposite reasons.

Alon discovered it while working on constructions of expanders. It was known how to construct graphs with noticeable eigenvalue gap which would then (by the “simple direction”) be good expanders. This is because, for the algebraic constructions that we will talk about in a minute, it is easier to reason about the eigenvalues of the adjacency matrix than about the number of edges in a generic cut. Alon, by proving the hard direction, showed that the approach via eigenvalue can be taken “with no loss of generality,”

Others (Aldous and Diaconis? Jerrum and Sinclair? I never get this reference right) discovered the same result working on the problem of bounding the time that it takes for a random walk on certain graphs to converge to the stationary distribution. To answer such questions, one needs to know the eigenvalue gap of the graphs, a quantity that is difficult to estimate directly in important cases. For those graphs, however, it is possible to reason about the size of cuts, and hence, via the difficult connection, obtain results on the eigenvalue gap (and, thus, on the convergence time, or “mixing” time of a random walk).

This “difficult direction” result has also an important algorithmic application. The counterpositive statement is that if largest and second largest eigenvalue are close in value, then the graph is not expanding, so one can cut off a set of vertices by deleting a comparatively small number of vertices. The proof is algorithmic: given the eigenvector of the second largest eigenvalue, this small cut can be found efficiently. This is the principle at work in “spectral partitioning” algorithms, often used in practice (and, occasionally, in theory).

Well, that was a long digression, but it’s also an interesting story. Back to our constructions of expander graphs, we want to describe a bounded-degree graph explicitly and prove that there is a noticeable gap between largest and second largest eigenvalue. If the graph is a Cayley graph derived from a group G and set of generators D (and D is chosen properly), it turns out that the eigenvalues of the adjacency matrix of the graph can be explicitly computed by looking at the representations of the group G. If G is Abelian (if the group operation is commutative), then the representation theory of G is very simple and it is “just” the theory of Fourier transforms of functions f: G \rightarrow C (where C are the complex numbers), which is much easier than it sounds.

Unfortunately, a Cayley graph based on an Abelian group can never be a good expander. To see why, you’ll have to believe my claim that in a good expander with n vertices the diameter is at most O(\log n). Consider now an n-vertex Cayley graph of degree d. Every vertex x is reachable from vertex 0 via a path of length O(\log n). That is, x is the sum of O(\log n) elements of the set D. Because of commutativity, we can rearrange the sum so that it looks as

(a_1+ \ldots + a_1) + (a_2 + \ldots + a_2) +\ldots (a_d + \ldots + a_d)

where a_1,\ldots,a_d are the elements of D. We see that the sum is specified just by saying how many time each element of D must be added, and so x is specified by d numbers, each between 1 and O(\log n). This gives n \leq O((\log n)^d) which is a contradiction.

Indeed, we see that Cayley graphs based on Abelian groups fail very badly to have the required diameter condition: if the degree is d, there will be vertices at distance about n^{1/d} from each other, a bound that is met by a d-dimensional grid.

Intuitively, then, we can get an expanding bounded-degree Cayley graph only if we start from a “very non-Abelian” group. But how do we quantify the “Abelianity” of a group, and what kind of groups are “very non-Abelian”? This will be the subject of the next post.

A few years ago, Nati Linial and Avi Wigderson taught a course on expander graphs. The course lecture notes have been edited into an article that will appear in the Notices Bulletin of the AMS.

Categories