The Alon-Boppana Theorem

Let {G=(V,E)} be a {d}-regular graph, and let

\displaystyle  d= \lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n

be the eigenvalues of the adjacency matrix of {A} counted with multiplicities and sorted in descending order.

How good can the spectral expansion of {G} be?

1. Simple Bounds

The simplest bound comes from a trace method. We have

\displaystyle  trace(A^2) = \sum_i \lambda_i^2

by using one definition of the trace and

\displaystyle  trace(A^2) = \sum_{v,v} (A^2)_{v,v} \geq dn

using the other definition and observing that {(A^2)_{v,v}} counts the paths that go from {v} to {v} in two steps, of which there are at least {d}: follow an edge to a neighbor of {v}, then follow the same edge back. (There could be more if {G} has multiple edges or self-loops.)

So we have

\displaystyle  dn \leq d^2 + \sum_{i=2,\ldots,n} \lambda_i ^2

and so

\displaystyle  \max_{i=2,\ldots,n} |\lambda_i | \geq \sqrt d \cdot \sqrt{\frac {n-d}{n-1}}

The condition {d \leq n(1-\epsilon)} is necessary to get lower bounds of {\Omega(\sqrt d)}; in the clique, for example, we have {d=n-1} and {\lambda_2 = \lambda_n = -1}.

A trace argument does not give us a lower bound on {\lambda_2}, and in fact it is possible to have {\lambda_2=0} and {d= n/2}, for example in the bipartite complete graph.

If the diameter of {G} is at least 4, it is easy to see that {\lambda_2 \geq \sqrt d}. Let {a,b} be two vertices at distance 4. Define a vector {x} as follows: {x_a = 1}, {x_v = 1/\sqrt d} if {v} is a neighbor of {a}, {x_b=-1} and {x_v = - 1/\sqrt d} if {v} is a neighbor of {b}. Note that there cannot be any edge between a neighbor of {a} and a neighbor of {b}. Then we see that {||x||^2 = 4}, that {x^T A x \geq 4\sqrt d} (because there are {2d} edges, each counted twice, that give a contribution of {1/\sqrt d} to {\sum_{u,v} A_{uv} x_u x_v}) and that {x} is orthogonal to {(1,\ldots,1)}.

2. Nilli’s Proof of the Alon-Boppana Theorem

Nilli’s proof of the Alon-Boppana theorem gives

\displaystyle  \lambda_2 \geq 2 \sqrt{ d-1 } - O \left( \frac {\sqrt{d-1}}{diam(G)-4} \right)

where {diam(G) \geq \frac {\log |V|}{\log d-1}} is the diameter of {G}. This means that if one has a family of (constant) degree-{d} graphs, and every graph in the family satisfies {\lambda_2 \leq \lambda}, then one must have {\lambda \geq 2 \sqrt{d -1}}. This is why families of Ramanujan graphs, in which {\lambda_2 \leq 2 \sqrt{d-1}}, are special, and so hard to construct, or even to prove existence of.

Friedman proves a stronger bound, in which the error term goes down with the square of the diameter. Friedman’s proof is the one presented in the Hoory-Linial-Wigderson survey. I like Nilli’s proof, even if it is a bit messier than Friedman’s, because it starts off with something that clearly is going to work, but the first two or three ways you try to establish the bound don’t work (believe me, I tried, because I didn’t see why some steps in the proof had to be that way), but eventually you find the right way to break up the estimate and it works.

So here is Nilli’s proof.

We are going to use essentially the same vector that we used to analyze the spectrum of the infinite tree, although the analysis will be a bit different.

Let {a,b} be two vertices in {G} at distance {diam(G)}, and call {k = \lfloor (diam(G) -4)/2 \rfloor}. Let {a'} be a neighbor of {a}. We say that the distance of a vertex {v} from {\{ a,a'\}} is the smallest of the shortest path distance from {v} to {a} and from {v} to {a'}.

We construct a vector {x\in {\mathbb R}^V} as follows:

  • {x_v = \displaystyle \frac 1 { ( \sqrt{d-1})^{t }}} if {v \mbox{ is at distance } t \mbox{ from } \{ a,a'\}, \ t \leq k}
  • {x_v = 0} if {v} is at distance {> k} from {\{ a,a'\}}

Note that this is more or less the same vector we constructed in the case of the tree. The reason for talking about the distance from two vertices instead of one, is that we want to say that every vertex is adjacent to at most {d-1} vertices whose value of {x_v} is strictly smaller; in the case of a tree, the root is exceptional because it has {d} neighbors whose value of {x_v} is strictly smaller. There will be a step in the proof in which this choice really makes a difference.

We claim

\displaystyle   x^TA x \geq 2 \sqrt{d-1} \cdot ||x ||^2 \cdot \left( 1 - \frac 1 {k-1} \right) \ \ \ \ \ (1)

It turns out that, in order to prove (1) it is easier to reason about the Laplacian matrix than about the adjacency matrix. Define {L:= dI - A} to be the (non-normalized) Laplacian of {G}. We have the following nice expression for the quadratic form of {L}

\displaystyle  x^T L x = dx^Tx - x^TAx = \sum_{ \{ u,v \} \in E} (x_u - x_v)^2

For every vertex {u} let us call {S_u} (for smaller) the set of neighbors {v} of {u} such that {x_v < x_u}. We always have {|S_u| \leq d-1}. Let us call {L_t} the set of vertices at distance exactly {t} from {\{a,a'\}}.

Now we do the calculations:

\displaystyle  \sum_{ \{ u,v \} \in E} (x_u - x_v)^2 = \sum_{t\leq k} \sum_{u\in L_t} \sum _{ v\in S_u} (x_u - x_v)^2

\displaystyle  = \sum_{t\leq k-1} \sum_{u,\in L_t} \sum_{ v\in S_u} (x_u - x_v)^2 + \sum_{u\in L_k} \sum_{v\in S_u} x_u^2

\displaystyle  = \sum_{t\leq k-1} \sum_{u,\in L_t} |S_u| \cdot \left( x_u - \frac {x_u}{\sqrt {d-1}} \right)^2 + \sum_{u\in L_k} |S_u| x_u^2

\displaystyle  \leq \sum_{t\leq k-1} \sum_{u,\in L_t} (d-1) \cdot \left( x_u - \frac {x_u}{\sqrt {d-1}} \right)^2 + \sum_{u\in L_k} (d-1) x_u^2

\displaystyle  = \sum_{t\leq k-1} \sum_{u,\in L_t} x_u^2 \cdot \left( \sqrt{d-1} -1 \right)^2 + \sum_{u\in L_k} (d-1) x_u^2

\displaystyle  = \sum_{t\leq k-1} \sum_{u,\in L_t} x_u^2 \cdot (d - 2 \sqrt {d-1} ) + \sum_{u\in L_k} (d-1) x_u^2

\displaystyle  = \sum_u d x_u^2 - \sum_u 2\sqrt{d-1} x_u^2 + \sum_{u\in L_k} (2 \sqrt {d-1} - 1) x_u^2

Finally,

\displaystyle  \sum_{u\in L_k} x_u^2 = |L_k | \frac 1 {(d-1)^k} \leq \frac 1{k+1} \sum_{t=0}^k |L_t| \frac{1} {(d-1)^t} = \frac 1 {k+1} \sum_v x_v^2

because {|L_t| (d-1)^{-t}} decreases (or at least does not increase) for increasing {t}.

Putting everything together we have

\displaystyle  x^T L x \leq d ||x||^2 - 2 \sqrt{d-1} ||x||^2 + \frac {2 \sqrt{d-1} -1}{k+1}||x||^2

and so

\displaystyle  x^T Ax \geq \left( 2 \sqrt{d-1} - \frac {2 \sqrt{d-1} -1}{k+1} \right) ||x||^2

Now we are finally almost done: define a vector {y} with the same construction we did for {x}, but using the set {\{b,b'\}} as the reference for the distance, where {b'} is a neighbor of {b}. We then have

\displaystyle  y^T Ay \geq \left( 2 \sqrt{d-1} - \frac {2 \sqrt{d-1} -1}{k+1} \right) ||y||^2

It is clearly not possible for a vertex at distance {\leq k} from {\{a,a'\}} to also be at distance {\leq k} from {\{b,b'\}}, otherwise we would have a path of length {\leq 2k+2} from {a} to {b}, so the vectors {x} and {y} are non-zero on disjoint subsets of coordinate, and hence are orthogonal.

But we can say more: we also have {x^T A y = 0}, because there cannot be an edge {\{ u,v\}\in E} such that both {x_u >0} and {y_v >0}, because otherwise we would have a path of length {\leq 2k + 3} from {a} to {b}.

This means that if we take any linear combination {z = \alpha x + \beta y} of {x} and {y} we have

\displaystyle  z^T A z = \alpha^2 x^T Ax + \beta^2 y^TAy

\displaystyle  \geq \left( 2 \sqrt{d-1} - \frac {2 \sqrt{d-1} -1}{k+1} \right) \cdot ( \alpha^2 ||x||^2 + \beta^2 ||y||^2 )

\displaystyle  = \left( 2 \sqrt{d-1} - \frac {2 \sqrt{d-1} -1}{k+1} \right) ||z||^2

so we have found a two-dimensional set of vectors whose Rayleigh quotient is at most the above expression, and so

\displaystyle  \lambda_2 \geq 2 \sqrt{d-1} - \frac {2 \sqrt{d-1} -1}{k+1}

What did just happen? The basic intuition is that, as in the infinite tree, we set weights to go down by {1/\sqrt{d-1}} every time we get away from the “root,” and we would like to argue that, for every node {x_u}, we have

\displaystyle  \sum_{v} A_{u,v} x_v \geq 2\sqrt{d-1} x_u

by reasoning that one of the neighbors must be closer to the root, and hence have value {\sqrt{d-1} x_u}, while the other {d-1} neighbors are all at least {x_u/\sqrt{d-1}}. This bound fails at the “leaves” of the construction, which is fine because they account for a small portion of {||x||^2}, but it also fails at the root, which is not adjacent to any larger vertex. In the case of the infinite tree this is still ok, because the root also accounts for only a small portion of {||x||^2}; in general graphs, however, the “root” vertex might account for a very large fraction of {||x||^2}.

Indeed, the root contributes 1 to {||x||^2}, and each set {L_t} contributes {|L_t| \cdot (d-1)^t}. If the size of {L_t} grows much more slowly than {(d-1)^t}, then the contribution of the root to {||x||^2} is too large and we have a problem. In this case, however, for many levels {t}, there have to be many vertices in {L_t} that have fewer than {d-1} edges going forward to {L_{t+1}}, and in that case, for many vertices {u \in L_t} we will have that {\sum_v A_{u,v} x_v} will be much more than {2 \sqrt{d-1} x_u^2}.

Although it seems hopeless to balance this argument and charge the weight {x_u x_v} of the edge {(u,v)} in just the right way to {x_u^2} and to {x_v^2}, the calculation with the Laplacian manages to do that automatically.

3. A More Sophisticated “Trace” Method

The Hoory-Linial-Wigderson survey also gives a very clean and conceptual proof that

\displaystyle  \sigma_2 = \max \{ |\lambda_2| , | \lambda_n| \} \geq 2 \sqrt {d-1} \cdot \left( 1 - O \left( \frac {\log diam(G)}{diam(G)} \right) \right)

via a trace argument. I am not sure who was the first to come up with this idea.

Let us pick two vertices {a,b} at distance {diam(G)}, and call {x := {\bf 1}_{\{ a\}}}, {y:= {\bf 1 }_{\{ b\}}} and {z:= x-y}. Fix {k = (diam(G)-2)/2}. Then {A^k x} and {A^ky} are orthogonal vectors, because the coordinates on which {A^kx} is nonzero correspond to vertices at distance {\leq k} from {a} and the coordinates on which {A^ky} is nonzero correspond to vertices at distance {\leq k} from {b}, and these conditions cannot be simultaneously satisfied. This means that {x^T A^{2k} y = 0}.

Since {z} is orthogonal to {(1,\ldots,1)}, we have

\displaystyle  z^T A^{2k} z \leq \sigma_2^{2k} ||z||^2 = 2 \sigma_2^{2k}

but we also have

\displaystyle  z^T A^{2k} z = x^T A^{2k} x + y^T A^{2k} y - 2 x^T A^{2k} y = x^T A^{2k} x + y^T A^{2k} y

Now, {x^T A^{2k} x} is the number of walks of length {2k} in {G} that start at {a} and get back to {a}. In every {d}-regular graph, and for every start vertex, the number of such walks is at least the corresponding number in the infinite {d}-ary tree. This is clear if {G} is a Cayley graph; it takes a few minutes (at least I had to think about it for a bit) to see that it holds in every graph.

Ok, then, what is the number of closed walks of length {k} in the infinite {d}-ary tree? This is a long story, but it is a very well-studied question and there is a bound (not the tightest know, but sufficient for our purposes) giving

\displaystyle \sigma_2^k \geq \min \{ x^T A^{2k} x , y^T A^{2k} y \}

\displaystyle  \geq t_{2k} \geq {2k \choose k} \frac 1{k+1} (d-1)^k

\displaystyle  = \Omega( 2^{2k} \cdot k^{-1.5} \cdot (d-1)^k )

so,

\displaystyle  \sigma_2 \geq 2 \sqrt{d-1} \cdot (\Omega(k^{-1.5}))^{1/2k} \geq 2 \sqrt{d-1} \cdot \left( 1 - \frac { O(\log k) }{k} \right)

I put “trace” in quote in the title of this section, but one can turn the argument into an honest-to-God application of the trace method, although there is no gain in doing so. Pick an even number {k}. We can say that

\displaystyle  trace(A^k) = \sum_i \lambda_i^k \leq d^k + (n-1) \sigma_2^k

and

\displaystyle  trace(A^k) \geq n t_k \geq \Omega( 2^k \cdot k^{-1.5} \cdot \sqrt{d-1} )

which gives

\displaystyle  \sigma_2 \leq 2 \sqrt{d-1} \cdot \left( 1 - \frac { (O\log k) }{k} \right)

for {k << \frac{ \log n}{\log d}}, which is implied by the bound based on diameter that we proved above.

4 thoughts on “The Alon-Boppana Theorem

  1. Nice writeup. By the way, A. Nilli is a pseudonym for Noga Alon. (One of Noga’s daughters is named Nilli.)
    — Ravi Boppana

  2. In the simple bound section in last para, it seems that x^T A x can be >= 4 (instead of =4) since there is also contribution from edges between two neighbors or a, or betweem two neighbors of b?

  3. Yes, it’s at least 4 \sqrt d, I corrected it.

    Ravi, I went on for 1385 words without revealing Noga’s secret identity and you give it away like this!

  4. Pingback: The expansion of the Payley graph | in theory

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s