You are currently browsing the tag archive for the 'Expanders' tag.
[In which we prove the "difficult part" of Cheeger's inequality by analyzing a randomized rounding algorithm for a continuous relaxation of sparsest cut.]
We return to this month’s question: if is a
-regular graph, how well can we approximate its edge expansion
defined as
and its sparsest cut defined as
,
where is the adjacency matrix of
.
We have looked at three continuous relaxations of , the spectral gap, the Leighton-Rao linear program, and the Arora-Rao-Vazirani semidefinite program.
As we saw, the spectral gap of , defined as the difference between largest and second largest eigenvalue of
, can be seen as the solution to a continuous optimization problem:
.
It follows from the definitions that
which is the “easy direction” of Cheeger’s inequality, and the interesting thing is that is never much smaller, and it obeys
,
which is the difficult part of Cheeger’s inequality. When we normalize all quantities by the degree, the inequality reads as
.
I have taught (1) in three courses and used it in two papers, but I had never really understood it, where I consider a mathematical proof to be understood if one can see it as a series of inevitable steps. Many steps in the proofs of (1) I had read, however, looked like magic tricks with no explanation. Finally, however, I have found a way to describe the proof that makes sense to me. (I note that everything I will say in this post will be completely obvious to the experts, but I hope some non-expert will read it and find it helpful.)
We prove (1), as usual, by showing that given any such that
we can find a threshold such that the cut
defined by
satisfies
and
.
This not only gives us a proof of (1), but also an algorithm for finding sparse cuts when they exist: take a vector which is an eigenvector of the second eigenvalue (or simply a vector for which the Rayleigh quotient in (2) is small), sort the vertices
according to the value of
, and find the best cut among the “threshold cuts.” This is the “spectral partitioning” algorithm.
This means that proving (1) amounts to studying an algorithm that “rounds” the solution of a continuous relaxation to a combinatorial solution, and there is a standard pattern to such arguments in computer science: we describe a randomized rounding algorithm, study its average performance, and then argue that there is a fixed choice that is at least as good as an average one. Here, in particular, we would like to find a distribution over threshold, such that if we define
as a random variable in terms of
we have
and so, using linearity of expectation,
from which we see that there must be a threshold in our sample space such that (3) holds. I shall present a proof that explicitly follows this pattern. It would be nice if we could choose uniformly at random in the interval
, but I don’t think it would work. (Any reader can see a counterexample? I couldn’t, but we’ll come back to it.) Instead, the following works: assuming with no loss of generality that the median of
is zero,
can be chosen so that
is distributed uniformly in the interval
. (This means that thresholds near the median are less likely to be picked than thresholds far from the median.) This choice seems to be a magic trick in itself, voiding the point I made above, but I hope it will become natural as we unfold our argument. Read the rest of this entry »
[In which we finally get to Leighton-Rao and ARV.]
Continuing our edge expansion marathon, we are going to see more ways in which the sparsest cut problem can be relaxed to polynomial-time solvable continuous problems. As before, our goal is to understand how this implies certificates of expansion.
We are talking about the sparsest cut of a graph
, which is
We have looked for a while at the eigenvalues gap of . If
is
-regular and the second largest eigenvalue of
is
, then the difference between largest and second largest eigenvalue obeys
and, noting that when
, we saw that
is a relaxation of
, and
.
Leighton and Rao derive a continuous relaxation of in quite a different way. They note that when
, then the “distance”
induces a semi-metric on , that is,
,
, and the triangle inequality holds
One can then consider the relaxation
where the minimum is taken over all distance functions that define a semi-metric on
.
We can rewrite (1) as the linear program
where we only write the triangle inequality constraints by having one variable for every unordered pair
.
The minimum of (2) gives a lower bound to the sparsest cut of , and so every feasible solution for the dual of (2) gives a lower bound to
.
Before trying to write down the dual of (2), it is convenient to write (2) in a less efficient way. (This is akin to the fact that sometimes, when using induction, it is easier to prove a more general statement.) Instead of writing the triangle inequality for every three vertices, we write it for every arbitrary sequence of vertices.
where we denote by the set of “paths” in the complete graph between
and
. (That is,
is the set of all sequences of vertices that start at
and end at
.)
Clearly (3) is the same as (2), in the sense that the extra inequalities present in (3) are implied by the triangle inequalities in (2). The dual of (3), however, is cleaner, and is given below
A feasible solution to (4) is thus a weighted set of paths and a parameter . Before reasoning about the meaning of a solution, suppose that
satisfy the stronger constraints
Then we can view the as defining a flow between
that carries at least
units of flow; furthermore, the union of all such flows uses at most a total capacity
on each edge
. We should think of it as an embedding of the complete graph scaled by
into
. I want to claim that this means that the sparsest cut of
is at least
times the sparsest cut of the complete graph, that is, at least
.
To verify the claim, take any cut . We know that such a cut separates
pairs of vertices
, and that at least
units of flow are routed between each such pair, so that at least
units of flow cross the cut. Since every edge has capacity 1, we see that at least
edges of
must cross the cut.
It remains to bridge the difference between (4) or (5). One possibility is to show that a solution to (4) can be modified to a solution satisfying (5) with no loss in . Another approach is to directly use (4) and a general weak duality argument to show directly that a solution satisfying (4) implies that
has at least the edge expansion of a complete graph with edge weights
.
There is, in fact, a broader principle at work, which we can verify by a similar argument: if is an assignment of weights to all possible paths (which we think of as defining a multicommodity flow),
is a graph with adjacency matrix
,
is a graph with adjacency matrix
, and we have
then we must have .
The fact that a feasible solution to (4) gives a lower bound to is a special case that applies to the case in which
is a scaled version of the complete graph.
This is a good point to pause, look back again at the bound coming from spectral considerations, and compare it with what we have here. Read the rest of this entry »
In the previous post, we saw that the sparsest cut of a graph (which is within a factor of 2 of the edge expansion of the graph), is defined as
and can also be written as
while the eigenvalue gap of , can be written as
and so it can be seen as a relaxation of , considering that
when
are boolean.
Let us now take a broader look at efficiently computable continuous relaxations of sparsest cut, and see that the eigenvalue gap is a semidefinite programming relaxation of sparsest cut, that by adding the triangle inequality to it we derive the Arora-Rao-Vazirani relaxation, and that by removing the semidefinite constraint from the ARV relaxation we get the Leighton-Rao relaxation. Read the rest of this entry »
In which we dwell at great length on the “easy” part of Cheeger’s inequality.
The edge expansion of a graph , defined as
is a measure of how “well connected” is a graph. A graph has edge expansion at least if and only if, in order to disconnect a set of
vertices from the rest of the graph, one must remove at least
edges. In other words, the removal of
edges from a graph of edge expansion
must leave a connected component of size
, where
is the number of vertices.
As readers of in theory know, graphs of large edge expansion (expander graphs) have many applications, and in this post we are going to return to the question of how one certifies that a given graph has large edge expansion.
For simplicity, we shall only talk about regular graphs. In a previous post we mentioned that if is the adjacency matrix of a graph, there are numbers
(the eigenvalues of
), not necessarily distinct, and an orthonormal basis
of
, such that for all
And if the graph is -regular, then
,
, and for all
we have
. Finally, the numbers
and the vectors
are computable in polynomial time. (This is all the linear algebra we will need.)
Given this setup, we have the lower bound
That is, the difference between largest and second largest eigenvalue gives a lower bound to the edge expansion of the graph.
Although (1) is very easy to prove, it is a remarkable result, in the sense that it gives a polynomial-time computable certificate of a “coNP” statement, namely, that for all cuts, at least a certain number of edges cross the cut. Since computing edge expansion is NP-complete, the inequality cannot always be tight. Eventually shall compare it with other methods to lower bound the edge expansion, and see that they are all special cases of the same general approach. To begin with, in this post we shall show that is an efficiently computable relaxation of the edge expansion problem. (Or, rather, of the closely related sparsest cut problem.) Read the rest of this entry »
Cheegers inequality relates the expansion of a graph to its eigenvalue gap.
There are at least three contexts where this result arises and is important:
Read the rest of this entry »
In which we look at the representation theory of Abelian groups and at the discrete Fourier transform, we completely understand the eigenvalues and eigenvectors of the Cayley graph of an Abelian group, and we apply this powerful theory to prove that the cycle is a connected graph.
When we talked about group representations, we saw how to map each element of a group into a matrix in such a way that group addition becomes matrix multiplication.
In general, if $G$ is a finite group, then an n-dimensional representation of $G$ is a map
$\rho: G \rightarrow {\mathbb C}^{n\times n}$
such that $\rho$ is a group homomorphism between $G$ and the group of matrices $\rho(G)$. That is, we have that $\rho(g)$ is an invertible matrix for every $g$, $\rho(0)=I$, $\rho(a+b) = \rho(a) \times \rho(b)$ and $\rho(-g) = (\rho(g))^{-1}$.
(Scott will be delighted to know that the theory need not stop at finite matrices, and one may also look at representations mapping group elements into invertible linear transformations over an infinite-dimensional Hilbert space. To study finite groups, however, finite-dimensional representations are good enough.)
Note that $\rho$ need not be injective, and so, for every group, we can always define the 1-dimensional representation $\rho(g) = 1$ for every $g$. Perhaps unsurprisingly, this is usually called the trivial representation. The precise statement of a theorem of Frobenius that we mentioned earlier is that every non-trivial representation of $PSL(2,p)$ has dimension at least $(p-1)/2$.
In an Abelian group, however, there are plenty of 1-dimensional representations, and, in fact, all the information about the group is “encoded” in the set of its 1-dimensional representations. We will call a 1-dimensional representation a character of the group. (This equivalence holds only for finite groups, otherwise we need an extra condition.) Hence a character of a group $G$ is a mapping
$\chi: G \rightarrow {\mathbb C}$
such that $\chi(g) \neq 0$ for every $g$, $\chi(0)=1$, $\chi(a+b) = \chi(a)\cdot \chi(b)$ and $\chi(-g) = (\chi(g))^{-1}$. In a finite group $G$, we always have the property that if we add an element to itself $|G|$ times we get $0$; from this we can deduce that, if $\chi()$ is a character, then $\chi(g)$ must be a $|G|$-th root of 1, that is, it must be a number of the form $e^{2\pi x i/|G|}$ for some $x$. In particular, $|\chi(g)|=1$ for every $g$.
(For general groups, we say that $\chi$ is a character if it is a 1-dimensional representation and it has the property that $|\chi(g)|=1$ for every $g$. While the latter property is a consequence of the former for finite groups and, more generally, for torsion groups, this is not the case for other groups. For example, $\chi(x) := e^{x}$ is a 1-dimensional representation for $\mathbb Z$ but it is not a character.)
At first sight, it might look like a finite group might have an infinite number of characters (or at least that’s what I thought when I first saw the definition). There is however a simple argument that shows that a finite group $G$ cannot have more than $|G|$ characters.
Consider the set of all functions of the form $f: G \rightarrow {\mathbb C}$; we can think of it as a $|G|$-dimensional vector space over $\mathbb C$. We have that the characters of $G$ are linearly independent, and so there cannot be more than $|G|$ of them. In fact, even more is true: the characters of $G$ are orthogonal to each other. For two functions $f,h$, define their inner product as
$(f,g) := \frac1 {|G|} \sum_{a\in G} f(a) \overline{g(a)} $
Then, if $\chi$ and $\eta$ are two different characters, we have
$(\chi,\eta) = \frac1 {|G|} \sum_{a\in G} \chi(a) \overline{\eta(a)}=0 $
The normalization in the definition of inner product is convenient because then we also have $(\chi,\chi)=1$ if $\chi$ is a character. (We will not prove these claims, but the proofs are very simple.)
We will now show that the cyclic group ${\mathbb Z}_N$ has indeed $N$ distinct characters $\chi_0,\ldots,\chi_{N-1}$. This means that this set of characters forms an orthonormal basis for the set of functions $f: {\mathbb Z}_N \rightarrow {\mathbb C}$, and that each such function can be written as a linear combination
$f(x) = \sum_{c=0}^{N-1} \hat f(c) \chi_c (x)$ (*)
where the coefficients $\hat f(c)$ can be computed as the inner product
$\hat f(c) = (f,\chi_c)$
(The equation (*) is the Fourier expansion of $f$, and the function $\hat f()$ is called the Fourier transform of $f$.)
Here is how to define the characters $\chi_c$: for each $c$, define $\chi_c(x):= e^{2\pi c x /N}$. It is immediate to verify that each such function is a character, and that, for $c=0,\ldots,N-1$ we are, indeed, defining $N$ different functions. By the previously mentioned results, there is no other function.
Let’s see one more example: consider $({\mathbb Z}_2)^n$, that is, $\{0,1\}^n$ with the operation of bit-wise XOR. For each ${\mathbf a} = (a_1,\ldots,a_n)\in \{0,1\}^n$, define the function
$\chi_{\mathbf a} (x_1,\ldots,x_n) := (-1)^{\sum_i a_i x_i }$
We can again verify that all these $2^n$ functions are distinct, that each of them is a character, and so that they include all the characters of $({\mathbb Z}_2)^n$. The reader should be able to see the pattern and to reconstruct the $N_1\times \cdots N_k$ characters of the group
${\mathbb Z}_{N_1} \times \cdots \times {\mathbb Z}_{N_k}$
thus having a complete understanding of the set of characters of any given finite Abelian group.
At long last, we can get to the subject of expansion.
Let $A$ be a finite Abelian group, and $S$ be a set of generators such that if $a\in S$ then $-a\in S$. Definite the Cayley graph $G=(A,E)$ where every element of $A$ is a vertex and where $(a,b)$ is an edge if $a-b\in S$. This is cleary an $|S|$-regular undirected graph. Let $\chi_1,\ldots,\chi_{|A|}$ be the set of characters of $A$. Think of each character $\chi_i$ as an $|A|$-dimensional vector. (The vector that has the value $\chi_i(a)$ in the $a$-th coordinate.) Then these vectors are orthogonal to each other. We claim that they are also eigenvectors for the adjacency matrix of $G$.
Note a surprising fact here: the graph $G$ depends on the group $A$ and on the choice of generators $S$. The eigenvectors of the adjacency matrix, however, depend on $A$ alone. (The eigenvalues, of course, will depend on $S$.)
The proof of the claim is immediate. Let $M$ be the adjacency matrix of $G$, then the $b$-th entry of the vector $\chi_i \times M$ is
$\sum_{a\in A} \chi_i(a) M(a,b) = \sum_{s\in S} \chi_i(b+s) =$
$ \sum_{s\in S} \chi_i(b)\chi_i (s) = \chi_i (b) \sum_{s\in S} \chi_i(s)$
We are done! The vector $\chi_i$ is indeed an eigenvector, of eigenvalue $\sum_{s\in S} \chi_i(s)$. Since we have $|A|$ eigenvectors, and they are linearly independent, we have found all eigenvalues, with multiplicities.
Let’s look at the cycle with $N$ vertices. It is the Cayley graph of ${\mathbb Z}_N$ with generators $\{-1,1\}$. We have $N$ eigenvalues, one for each $c=0,\ldots,N$, and we have
$\lambda_c = \chi_c(-1) + \chi_c(1) = e^{-2\pi c i/N} + e^{-2\pi c i/N}$
this may seem to contradict our earlier claim that all eigenvalues were going to be real. But two lines of trigonometry show that, in fact,
$\lambda_c = 2 \cos (2\pi c/N)$
As expected in a 2-regular graph, the largest eigenvalue is $\lambda_0 = 2\cos(0)=2$. The second largest eigenvalue, however, is $2\cos(2\pi /N$, which is $2-\Theta(1/N^2)$. This means that the expansion of the cycle is at least $\Omega(1/N^2)$ and, in particular, the cycle is a connected graph!
(After we are done having a good laugh at the expense of algebraic graph theory, I should add that this calculation, plus some relatively easy facts, implies that a random walk on the cycle mixes in $O(N^2)$ steps. Not only this is the right bound, but it is also something that does not have a completely straightforward alternative proof.)
We do a little better with the hypercube. The hypercube is the Cayley graph of the group $({\mathbb Z}_2)^n$ with the generator set that contains all the vectors that have precisely one 1. Now we have one eigenvalue for each choice of $(a_1,\ldots,a_n)$, and the corresponding eigenvalue is
$\sum_{i=1}^n (-1)^{a_i}$.
The largest eigenvalue, when all the $a_i$ are zero, is $n$, as expected in an $n$-regular graph. The second largest eigenvalue, however, is at most $n-2$. This means that the expansion of the hypercube is at least 1. (As it happens, it is precisely 1, as shown by a dimension cut.)
Before trying to describe how one reasons about the eigenvalues of the adjacency matrix of a Cayley graph, I would like to describe the “easy direction” of the relationships between eigenvalues and expansion.
First, a one-minute summary of linear algebra. If is an
matrix, then a (column) non-zero vector
is an eigenvector of
provided that, for some scalar
, we have
. If so, then
is an eigenvalue of
. A couple of immediate observations:
- The equation
can be rewritten as
, so if
is an eigenvalue of
then the columns of the matrix
are not linearly independent, and so the determinant of
is zero. But we can write
as a degree-
polynomial in the unknown
, and a degree-
polynomial cannot have more than
roots.
So, an
matrix has at most
eigenvalues.
- If
is an eigenvalue of
, then the set of vectors
such that
is a linear space. If
and
are different eigenvalues, then their respective spaces of eigenvectors are independent (they only intersect at 0).
If is a symmetric real-valued matrix, then a series of miracles (or obvious things, depending on whether you have studied this material or not) happen:
- All the roots of the polynomial
are real; so, counting multiplicities,
has
real eigenvalues.
- If
is an eigenvalue of
, and it is a root of multiplicity
of
, then the space of vectors
such that
has dimension
.
- If
are two different eigenvalues, then their respective spaces of eigenvectors are orthogonal
From this fact, it is easy to conclude that starting from symmetric real matrix we can find a sequence of
numbers
and a sequence of unit vectors
such that the
are orthogonal to each other and such that
.
Now the claim is that these numbers and
vectors completely describe
. Indeed, suppose that we are given a vector
and that we want to compute
. First, we can write
as a linear combination of the
:
and since the are an orthonormal basis, we know what the coefficients
are like: each
is just the inner product of
and
.
Now, by linearity, we have
and, since the are eigenvectors,
So, once we express vectors in the basis defined by the , we see that multiplication by
is just the effect of multiplying coordinate
by the value
.
But there is more. Recall that and so we can also write
and so, in particular, matrix is just
a sum of rank-1 matrices, with the coefficients of the sum being the eigenvalues.
Let now be the adjacency matrix of a
-regular undirected graph (so
is real-valued and symmetric),
be its eigenvalues, and
be a corresponding set of orthonormal eigenvectors.
It is not hard to see (no, really, it’s two lines, see the bottom of page 2 of these notes) that all eigenvalues are between and
. Also,
is always an eigenvalue, as shown by the eigenvector
. This means that
and we can take
and we have
Suppose now that all the other ,
are much smaller than
in absolute value. Then, intuitively, the sum is going to be dominated by the first term, and
behaves sort of like the matrix
, that is, the matrix that has the value
in each entry.
You may be skeptical that such an approximation makes sense, but see where this leads. If our graph is , and
is a cut of the set of vertices into two subsets, let
be the
-dimensional vector that is 1 in coordinates corresponding to
and 0 in the other coordinates, and define
similarly. Then the number of edges crossing the cut is precisely
and if we replace by the matrix that has
in each coordinate we get the approximation
, which is the expected number of edges that would cross the cut if we had a random graph of degree
. This is pretty much the right result: if
are all much smaller than
, and if
and
are at least a constant times
, then the number of edges crossing the cut is indeed very close to
. This result is known as the Expander Mixing Lemma, and the proof is just a matter of doing the “obvious” calculations. (Write
and
in the eigenvector basis, distribute the product, remember that the
are orthogonal, use Cauchy-Schwarz …)
It is perhaps less intuitive that, provided that all other eigenvalues are at least a little bit smaller than , then a reasonably large number of edges crosses the cut. In particular, if
and if
, then at least
edges cross the cut. That is, if all eigenvalues except the largest one are at most
then the edge expansion of the graph is at least
. (It takes about five lines to prove this, see the beginning of the proof of Thm 1 in these notes.)
To summarize: the adjacency matrix of an undirected graph is symmetric, and so all eigenvalues are real. If the graph is -regular, then
is the largest eigenvalue, and all others are between
and
. If all the others are between
and
, then the graph has edge-expansion
(or better).
(As will become clear soon, I am totally out of my depth: Readers who find glaring mistakes, and even small ones, should note them in the comments.)
In the previous post on expanders we left off with the observation that if we want to construct an expanding Cayley graph, then we need to start from a group where all $n$ elements of the group can be obtained by a sum of $O(log n)$ terms chosen, with repetitions, from a set of $O(1)$ generators; this means that a constant root of the $n^{O(1)}$ ways of rearranging such sums must lead to different results. The group, then, has to be non-commutative in a very strong sense.
I will describe a way in which a group used to construct expanders is “very non-commutative.” This will not be a sufficient property for the expansion, but it is closely related and it will come up in the paper of Gowers that we will talk about later.
With this objective in mind, let’s see the notion of group representation. Let’s start from the simplest group, the cyclic group of $N$ elements $Z_N$ with the operation of addition (or $Z/NZ$ as mathematicians write). One can visualize this group as $N$ points equally spaced along a circle in the plane, with element $a$ forming an angle of $2\pi a/ N$ with the $x$ axis. If we want to add an element that forms an angle $\alpha$ to an element that forms an angle $\beta$, that’s the same as putting the “two angles next to each other” and we get a point that forms an angle $\alpha+\beta$. More algebraically, we identify an element $a$ with the complex number $e^{2\pi a i/n}$ and we realize group addition as multiplication of complex numbers. There is, however, another way of visualizing the cyclic group on a cycle. We can think of group element $a$ as the operation that takes a point on the cycle and moves it by an angle of $2\pi a/N$. Addition of group elements now corresponds to “composition,” that is, corresponds to applying one rotation after the other.
In this view, element $a$ is now the function that maps complex number $x$ into complex number $x\cdot e^{2\pi a i/N}$.
Suppose now that our group is a product $Z_{N_1} \times \cdots \times Z_{N_k}$ of cyclic groups. This means that a group elements if a $k$-tuple of the form $(a_1,\ldots,a_d)$, and that
$(a_1,\ldots,a_d) + (b_1,\ldots,b_d) = (a_1+b_1 \bmod N_1,\ldots,a_k+b_k \bmod N_k)$
It now makes sense to view a group element $(a_1,\ldots,a_d)$ as a “high-dimensional rotation” operation that takes in input $k$ complex numbers $(x_1,\ldots,x_k)$ and outputs the $k$ complex numbers
$(x_1 \cdot e^{2\pi a_1 i /N_1},\ldots, x_k \cdot e^{2\pi a_k i /N_k})$
If we take this view, we have, again, the property that group addition becomes composition of functions. Note, also, that the function that we have associated to each group element is a very simple type of linear function: it is simply multiplication of $x$ times a diagonal matrix that has diagonal
$(e^{2\pi a_1 i /N_1},\ldots, e^{2\pi a_k i /N_k})$
Notice, also, that if $f$ is a function of the form $f(x) = A\cdot x$, where $A$ is a matrix, and $g$ is a function of the form $B\cdot x$ where $B$ is a matrix, then $f(g(x))= A\cdot B\cdot x$. That is, for linear functions, function composition is the same as matrix multiplication.
To summarize, we have started from the group $Z_{N_1} \times \cdots \times Z_{N_k}$, and we have found a way to associate a complex-valued diagonal matrix to each group element in such a way that group addition becomes matrix multiplication.
It is known that all finite Abelian groups can be written as $Z_{N_1} \times \cdots \times Z_{N_k}$, so this type of representation via diagonal matrices is possible for every finite Abelian group.
What about more general groups? If $G$ is an arbitrary finite group, it is possible to associate to each element a block-diagonal matrix in such a way that group addition becomes matrix multiplication.
(It is common to call a group operation “multiplication” when a group is not Abelian, but for consistency with the rest of the post I will keep calling it addition.)
(By the way, it is also possible to represent infinite groups by associating a linear operator to each group element, but we will only discuss finite groups here.)
If the group is Abelian, then the matrices are diagonal, that is, they are block-diagonal matrices with “block size one.” So one way of quantifying “how non-Abelian” is a group is to consider how small the blocks can be in such matrix representations. That’s the dimension of the representation.
Here is an example of a family of groups whose representations cannot be low-dimensional. Let $p$ be a prime (it could also be a prime power) and let us consider $2 \times 2$ matrices whose entries are integers $\bmod p$. Let us restrict ourselves to matrices whose determinant is 1 modulo $p$, and consider the operation of matrix multiplication (where, also, all operations are mod $p$). This set of matrices forms a group, because the matrices are invertible (the determinant is non-zero) and the set contains the identity matrix and is closed under product. This group is called $SL(2,p)$. The group $SL(2,p)$ contains the tiny subgroup of two elements $\{ I,-I\}$; in the representation of $SL(2,p)$ this shows up as a block of size 1. If we take the quotient of $SL(2,p)$ by $\{ I,-I \}$ then we get another group, which is called $PSL(2,p)$.
It is now a theorem of Frobenius that every representation of $PSL(2,p)$ has dimension at least $(p-1)/2$. This is really large compared to the size of $PSL(2,p)$: the group $SL(2,p)$ has $p(p-1)^2$ elements, and so $PSL(2,p)$ has $p(p-1)^2/2$ elements. The dimension of the representation is thus approximately the cube root of the number of elements of the group.
Going back to representations of Abelian groups, we see that, in that case, not only each block had size one, but also that the entire matrix had size at most logarithmic in the number of elements of the group. This shows that $SL(2,p)$ and $PSL(2,p)$ are, at least in this particular sense, “very non-Abelian,” and it is an encouraging sign that they may be useful in constructing expanders and other quasi-random objects.
On his way toward a question about subsets of groups not containing triples of the form $\{ a, b, a+b\}$, Gowers shows that every dense Cayley graph constructed on $PSL(2,p)$ has constant edge expansion. (By dense, we mean that the degree is $\Omega(n)$, where $n$ is the number of vertices.) This is a result that, assuming Frobenius theorem, has a simple proof, which I hope to understand and describe at a later point.
The celebrated Ramanujan graphs of Lubotzky, Phillips and Sarnak are constant-degree Cayley graphs constructed from $PGL(2,p)$, which is the same as $PSL(2,p)$ except that we put no restriction on the determinant. Their relationship between degree and eigenvalue gap is best possible. The analysis of Lubotzky et al. is, unfortunately, completely out of reach.
I have started to read a new paper by Gowers titled Quasi-random groups. “Quasi-random” is the more-or-less standard term used to denote a single object that has some of the typical properties of a random object; this is distinct from the term “pseudo-random” which is more often applied to a distribution that satisfies a certain set of properties with approximately the same probability as the uniform distribution.
(The distinction is sometimes artificial: for example the support of a pseudorandom distribution can also be thought of as a quasi-random set. To add confusion, in older literature on extractors certain high-entropy sources were called “quasi-random” sources.)
Although the paper does not directly talk about expander graphs, it addresses related questions, and it gives an opportunity for a post (or more than one) on expander graph constructions.
Informally, an expander graph is a graph that, while having small bounded degree, is very well-connected. For example, a grid can be disconnected into two equal-size connected components after removing just
edges. This means that a
fraction of all pairs of vertices can be disconneted while only removing a
fraction of all edges; for this reason, we do not think of a 2-dimensional grid as a good expander. In a graph of degree
it is always possible to cut off
vertices from the rest of the graph by removing
edges; a graph is an expander if this is best possible. That is, a graph is an expander if for every set of
vertices (where
is less than half of the total number of vertices) you need to remove at least
edges to disconnect those vertices from the rest of the graph.
More quantitatively, a -regular graph has edge expansion at least
if for every set
of vertices there are at least
edges going between vertices in
and vertices not in
. (Provided
is less than half the total number of vertices.)
We have a good construction of expanders if for fixed constants and
we are able to construct
-expanders of degree
with any desired number of vertices.
The applications of expander graphs are vast, they are useful to reduce randomness in probabilistic algorithms, they come up in certain constructions of data structures, of randomness extractors, and of error-correcting codes, and they have many applications in complexity theory. Most recently, Dinur’s new proof of the PCP Theorem uses expanders. A great set of lecture notes on expanders is here, and not so great one is here (lectures 8-12 are on expanders).
Before the great zig-zag graph product revolution, almost all the good expander graph constructions were Cayley graphs of non-Abelian groups. Let’s see what this means and how it helps.
If is a group and
is a subset, then
and
define a
-regular graph with
vertices as follows: every element of
is a vertex, and vertices
and
are connected if
. If we assume that, for every
we also have
, then we can think of the graph as being undirected.
A first trivial observation is that if is a set of generators, that is, if we can realize every element of
as a sum of elements of
, then the graph is connected, otherwise it is disconnected.
Another point, which is true for all graphs, not just Cayley graphs, is that we can approximate the expansion of a graph by computing the eigenvalues of the adjacency matrix and looking at the difference between the largest and the second largest in absolute value. This was a point that I used to find mystifying, until I studied it and realized that it makes perfect sense. I may talk about it in another post, but I want to mention an amusing fact. The connection between eigenvalues and expansion has an easy direction (if there is a large eigenvalue gap then there is large expansion) and a difficult direction (if there large expansion then there must be large eigenvalue gap). The easy direction was “well known,” and the difficult direction was discovered around the same time in two distinct communities for opposite reasons.
Alon discovered it while working on constructions of expanders. It was known how to construct graphs with noticeable eigenvalue gap which would then (by the “simple direction”) be good expanders. This is because, for the algebraic constructions that we will talk about in a minute, it is easier to reason about the eigenvalues of the adjacency matrix than about the number of edges in a generic cut. Alon, by proving the hard direction, showed that the approach via eigenvalue can be taken “with no loss of generality,”
Others (Aldous and Diaconis? Jerrum and Sinclair? I never get this reference right) discovered the same result working on the problem of bounding the time that it takes for a random walk on certain graphs to converge to the stationary distribution. To answer such questions, one needs to know the eigenvalue gap of the graphs, a quantity that is difficult to estimate directly in important cases. For those graphs, however, it is possible to reason about the size of cuts, and hence, via the difficult connection, obtain results on the eigenvalue gap (and, thus, on the convergence time, or “mixing” time of a random walk).
This “difficult direction” result has also an important algorithmic application. The counterpositive statement is that if largest and second largest eigenvalue are close in value, then the graph is not expanding, so one can cut off a set of vertices by deleting a comparatively small number of vertices. The proof is algorithmic: given the eigenvector of the second largest eigenvalue, this small cut can be found efficiently. This is the principle at work in “spectral partitioning” algorithms, often used in practice (and, occasionally, in theory).
Well, that was a long digression, but it’s also an interesting story. Back to our constructions of expander graphs, we want to describe a bounded-degree graph explicitly and prove that there is a noticeable gap between largest and second largest eigenvalue. If the graph is a Cayley graph derived from a group and set of generators
(and
is chosen properly), it turns out that the eigenvalues of the adjacency matrix of the graph can be explicitly computed by looking at the representations of the group
. If
is Abelian (if the group operation is commutative), then the representation theory of
is very simple and it is “just” the theory of Fourier transforms of functions
(where C are the complex numbers), which is much easier than it sounds.
Unfortunately, a Cayley graph based on an Abelian group can never be a good expander. To see why, you’ll have to believe my claim that in a good expander with vertices the diameter is at most
. Consider now an
-vertex Cayley graph of degree
. Every vertex
is reachable from vertex 0 via a path of length
. That is,
is the sum of
elements of the set
. Because of commutativity, we can rearrange the sum so that it looks as
where are the elements of
. We see that the sum is specified just by saying how many time each element of
must be added, and so
is specified by
numbers, each between
and
. This gives
which is a contradiction.
Indeed, we see that Cayley graphs based on Abelian groups fail very badly to have the required diameter condition: if the degree is , there will be vertices at distance about
from each other, a bound that is met by a
-dimensional grid.
Intuitively, then, we can get an expanding bounded-degree Cayley graph only if we start from a “very non-Abelian” group. But how do we quantify the “Abelianity” of a group, and what kind of groups are “very non-Abelian”? This will be the subject of the next post.
A few years ago, Nati Linial and Avi Wigderson taught a course on expander graphs. The course lecture notes have been edited into an article that will appear in the Notices Bulletin of the AMS.

Recent Comments