be the eigenvalues of the adjacency matrix of counted with multiplicities and sorted in descending order.

How good can the spectral expansion of be?

**1. Simple Bounds **

The simplest bound comes from a trace method. We have

by using one definition of the trace and

using the other definition and observing that counts the paths that go from to in two steps, of which there are at least : follow an edge to a neighbor of , then follow the same edge back. (There could be more if has multiple edges or self-loops.)

So we have

and so

The condition is necessary to get lower bounds of ; in the clique, for example, we have and .

A trace argument does not give us a lower bound on , and in fact it is possible to have and , for example in the bipartite complete graph.

If the diameter of is at least 4, it is easy to see that . Let be two vertices at distance 4. Define a vector as follows: , if is a neighbor of , and if is a neighbor of . Note that there cannot be any edge between a neighbor of and a neighbor of . Then we see that , that (because there are edges, each counted twice, that give a contribution of to ) and that is orthogonal to .

**2. Nilli’s Proof of the Alon-Boppana Theorem **

Nilli’s proof of the Alon-Boppana theorem gives

where is the diameter of . This means that if one has a family of (constant) degree- graphs, and every graph in the family satisfies , then one must have . This is why families of *Ramanujan* graphs, in which , are special, and so hard to construct, or even to prove existence of.

Friedman proves a stronger bound, in which the error term goes down with the square of the diameter. Friedman’s proof is the one presented in the Hoory-Linial-Wigderson survey. I like Nilli’s proof, even if it is a bit messier than Friedman’s, because it starts off with something that *clearly* is going to work, but the first two or three ways you try to establish the bound don’t work (believe me, I tried, because I didn’t see why some steps in the proof had to be that way), but eventually you find the right way to break up the estimate and it works.

So here is Nilli’s proof.

We are going to use essentially the same vector that we used to analyze the spectrum of the infinite tree, although the analysis will be a bit different.

Let be two vertices in at distance , and call . Let be a neighbor of . We say that the distance of a vertex from is the smallest of the shortest path distance from to and from to .

We construct a vector as follows:

- if
- if is at distance from

Note that this is more or less the same vector we constructed in the case of the tree. The reason for talking about the distance from two vertices instead of one, is that we want to say that every vertex is adjacent to at most vertices whose value of is strictly smaller; in the case of a tree, the root is exceptional because it has neighbors whose value of is strictly smaller. There will be a step in the proof in which this choice really makes a difference.

We claim

It turns out that, in order to prove (1) it is easier to reason about the Laplacian matrix than about the adjacency matrix. Define to be the (non-normalized) Laplacian of . We have the following nice expression for the quadratic form of

For every vertex let us call (for **s**maller) the set of neighbors of such that . We always have . Let us call the set of vertices at distance exactly from .

Now we do the calculations:

Finally,

because decreases (or at least does not increase) for increasing .

Putting everything together we have

and so

Now we are finally almost done: define a vector with the same construction we did for , but using the set as the reference for the distance, where is a neighbor of . We then have

It is clearly not possible for a vertex at distance from to also be at distance from , otherwise we would have a path of length from to , so the vectors and are non-zero on disjoint subsets of coordinate, and hence are orthogonal.

But we can say more: we also have , because there cannot be an edge such that both and , because otherwise we would have a path of length from to .

This means that if we take any linear combination of and we have

so we have found a two-dimensional set of vectors whose Rayleigh quotient is at most the above expression, and so

What did just happen? The basic intuition is that, as in the infinite tree, we set weights to go down by every time we get away from the “root,” and we would like to argue that, for every node , we have

by reasoning that one of the neighbors must be closer to the root, and hence have value , while the other neighbors are all at least . This bound fails at the “leaves” of the construction, which is fine because they account for a small portion of , but it also fails at the root, which is not adjacent to any larger vertex. In the case of the infinite tree this is still ok, because the root also accounts for only a small portion of ; in general graphs, however, the “root” vertex might account for a very large fraction of .

Indeed, the root contributes 1 to , and each set contributes . If the size of grows much more slowly than , then the contribution of the root to is too large and we have a problem. In this case, however, for many levels , there have to be many vertices in that have fewer than edges going forward to , and in that case, for many vertices we will have that will be much more than .

Although it seems hopeless to balance this argument and charge the weight of the edge in just the right way to and to , the calculation with the Laplacian manages to do that automatically.

**3. A More Sophisticated “Trace” Method **

The Hoory-Linial-Wigderson survey also gives a very clean and conceptual proof that

via a trace argument. I am not sure who was the first to come up with this idea.

Let us pick two vertices at distance , and call , and . Fix . Then and are orthogonal vectors, because the coordinates on which is nonzero correspond to vertices at distance from and the coordinates on which is nonzero correspond to vertices at distance from , and these conditions cannot be simultaneously satisfied. This means that .

Since is orthogonal to , we have

but we also have

Now, is the number of walks of length in that start at and get back to . In every -regular graph, and for every start vertex, the number of such walks is at least the corresponding number in the infinite -ary tree. This is clear if is a Cayley graph; it takes a few minutes (at least I had to think about it for a bit) to see that it holds in every graph.

Ok, then, what is the number of closed walks of length in the infinite -ary tree? This is a long story, but it is a very well-studied question and there is a bound (not the tightest know, but sufficient for our purposes) giving

so,

I put “trace” in quote in the title of this section, but one can turn the argument into an honest-to-God application of the trace method, although there is no gain in doing so. Pick an even number . We can say that

and

which gives

for , which is implied by the bound based on diameter that we proved above.

]]>
*expander mixing lemma* is like in graphs that are not regular.

I don’t know if I will have time to return to this tomorrow, so here is a quick answer.

First, for context, the expander mixing lemma in regular graph. Say that is a -regular undirected graph, and is its adjacency matrix. Then let the eigenvalues of the normalized matrix be

We are interested in graphs for which all the eigenvalues are small in absolute value, except , that is, if we define

we are interested in graphs for which is small. The expander mixing lemma is the fact that for every two disjoint sets and of vertices we have

The inequality (1) says that, if is small, then the number of edges between every two large sets of vertices is almost determined just by the size of the sets, and it is equal to the expected number of edges between the two sets in a random -regular graph, up to an error term that depends on .

For the proof, we observe that, if we call the matrix that has ones everywhere, then

and then we substitute and in the above expression and do the calculations.

In the case of an irregular undirected graph , we are going to consider the normalized adjacency matrix , where is the adjacency matrix of and is the diagonal matrix such that , where is the degree of . As in the regular case, the eigenvalues of the normalized adjacency matrix satisfy

Let us define

the second largest eigenvalue in absolute value of .

We will need two more definitions: for a set of vertices , its volume is defined as

the sum of the degrees and the average degree, so that . Now we have

Lemma 1 (Expander Mixing Lemma)For every two disjoint sets of vertices , , we have

So, once again, we have that the number of edges between and is what one would expect in a random graph in which the edge exists with probability , up to an error that depends on .

To prove the lemma, we prove the following claim:

where is the matrix such that , and then the lemma will follow by just substituting and in the above expression.

The proof would be a bit cleaner if we had defined the normalized adjacency matrix to be an operator over a vector space with a different type of inner product from the standard Euclidean one (so that would become the norm of ), but this requires some set-up and explanations, so we will just carry on with the above definitions.

Write the eigenvalue decomposition of the normalized adjacency matrix : it is going to be

where the are an orthonormal basis of eigenvectors.

To calculate the eigenvector of we see that

where we use the change of variable and the maximum is attained for so that is the unit vector parallel to , that is and .

Now, to compute we just have to compute the spectral norm of , so

where the last step is the change of variable and . It remains to observe that and that

which proves Equation (3) and thus the lemma.

]]>

There are no references and, most likely, plenty of errors. If you use the notes and find mistakes, please let me know by either emailing * luca at berkeley* or leaving a comment here.

]]>

When talking about the expansion of random graphs, abobut the construction of Ramanujan expanders, as well as about sparsifiers, community detection, and several other problems, the number comes up often, where is the degree of the graph, for reasons that tend to be related to properties of the infinite -regular tree.

If is a -regular graph, is its adjacency matrix and

are the eigenvalues of in non-increasing order, then a measure of the expansion of is the parameter

which is the second largest singular value of . One way to think about the above parameter is that the “best possible -regular expander,” if we allow weights, is the graph whose adjacency matrix is , where is the matrix with ones in every entry. The parameter measures the distance between and according to the spectral norm. (The spectral norm is a good one to consider when talking about graphs, because it bounds the cut norm, it is efficiently computable, and so on.)

If is -regular and bipartite, then and an appropriate measure of expansion is , which is just .

Nilli proved that, in a -regular graph, , where is the diameter of . Nilli’s construction is a variant of the way you prove that the spectral norm of the infinite tree is at least . Lubotzky, Phillips and Sarnak call a -regular graph *Ramanujan* if (or if in the case of bipartite graphs). So Ramanujan graphs are the best possible expanders from the point of view of the spectral definition.

Lubotzky, Phillips and Sarnak given an efficient construction of an infinite family of -regular Ramanujan graphs when is prime and , and this has been generalized by Morgenstern to all such that is a prime power.

Marcus, Spielman and Srivastava show the existence of infinitely many Ramanujan *bipartite* expanders for every degree. (For degree , their construction gives graphs with any number of nodes of the form .) Their proof uses the fact that the spectral norm of the infinite -regular tree is at most .

Friedman, in an outstanding tour the force, has given a 128-page proof of a conjecture of Alon, that a random -regular graph will satisfy, with high probability, . His paper is long, in part, because everything is explained very well. I am taking the analysis of the spectral norm of the infinite tree presented in this post from his paper.

(Notice that it is still an open question to construct, or even to prove the existence of, an infinite family of non-bipartite Ramanujan graphs of degree, say, , or to show, for any degree at all, (except 2!) that Ramanujan graphs of that degree exist for all number of vertices in a dense sets of integers.)

Now let’s talk about the spectrum of the infinite tree. First of all, although finite trees are terrible expanders, it is intuitive that an infinite tree is an *excellent* expander. For an infinite graph with a countable number of vertices and finite degree we can define the (non-normalized) expansion as

In a -regular infinite tree, the expansion is , because, if we take any set of vertices, there are at most edges in the subgraph induced by (because it’s a forest with vertices) and so there are at least edges leaving . It is easy to see that every other infinite -regular graph has expansion at most because, for every , we can run a DFS for steps starting from an arbitrary vertex until we either discover vertices reachable from (including ), or we find a connected component of size . In the former case, let be the vertices found by the DFS: the set induces a connected subgraph, so there are edges inside and edges leaving . In the latter case, the expansion is zero.

What about the *spectral* expansion of the infinite tree? If is a finite -regular graph, then the largest eigenvalue of its adjacency matrix is , and the corresponding eigenvector is the vector . By the spectral theorem we have

When we have an infinite -regular graph, the all-1 vector is not an eigenvector any more (because it has infinite norm), and the relevant quantity becomes the spectral norm

We will not need it, but I should remark that there is a Cheeger inequality for infinite graphs, which is actually slightly easier to prove than for finite graphs.

Since we want to prove that, for the infinite -regular tree we have , we need to argue that for every vector such that we have

If we fix a root , so that we can talk about the parent and the children of each node, and if we call the set of children of , then we want to show

Since we have an inequality with a summation on the left-hand-side and a square root on the right-hand-side, this looks like a job for Cauchy-Schwarz! The trick to get a one-line proof is to use the right way to break things up. One thing that comes to mind is to use , but this does not go anywhere, and it would give an upper bound of . This means that the bound is often loose, which must happen because and are often different in magnitude. To see why this should be the case, note that if we call , considering that there are vertices at distance from the root, the typical vertex at distance from the root satisfies , and we may think that if is a child of it should often be the case that is about a factor of smaller than . If that were the case, then a tighter form of Cauchy-Schwarz would be . Let us try that bound:

which works! To justify the identity in the middle line, note that, in the sum, the root appears times as a parent and never as a child, and every other vertex appears once as a child and times as a parent.

This proves that the spectral norm of the infinite -regular tree is at most . To prove that it is at least this much, for every we must find a vector such that

For such vectors, the Cauchy-Schwarz argument about is nearly tight, so it means that in such vectors, if is the parent of , then needs to be about times larger than . So let us start from this condition: we pick a vertex to be the root and we set , if is a child of we set , and if is at distance from the root we set . This means that if we sum over all the vertices at distance from the root we get , for all , which means that . So let us cut off the construction at some distance , so that is defined as follows:

- if is at distance from and
- if is at distance from .

We immediately see

Then we do the calculations and we see that

so, for every , we can construct a vector such that

and we are done!

]]>

The purpose of this post is to explain all the words in the previous sentence, and to show the proof, except for the major step of proving a certain identity.

There are at least a couple of reasons why more computer scientists should know about this result. One is that it is nice to see a connection, even if just at a syntactic level, between analytic facts that imply that the primes are pseudorandom and analytic facts that imply that good expanders are pseudorandom (the connection is deeper in the case of the Ramanujan Cayley graphs constructed by Lubotzky, Phillips and Sarnak). The other is that the argument looks at eigenvalues of the adjacency matrix of a graph as roots of a characteristic polynomial, a view that is usually not very helpful in achieving quantitative result, with the important exception of the work of Marcus, Spielman and Srivastava on interlacing polynomials.

Let us start with the Riemann zeta function. I apologize in advance to anybody who is familiar with the subject because I will probably say that analytic number theory equivalent of “by definition, a problem is NP-complete if it requires exponential time to be solved,” or worse.

One can define, for every real number

And in fact the right-hand side above is well defined also if is complex, provided .

It turns out that, just as you can specify a polynomial at a few points and have it uniquely defined at all points, it is possible to specify a complex function in a small range and, if one insists that the function be “nice,” this will fix it everywhere. A complex function is *holomorphic* if it is differentiable in a neighborhood of every point, which implies that it is infinitely differentiable and always equal to its Taylor series. It is *meromorphic* if this happens except at a set of isolated points. As it happens, there is only one meromorphic function that agrees with (as defined in (1)) on all complex numbers with , and that is defined on the whole complex plane, and we will now use to denote this extended function.

There are several interesting facts about , after this extension, some going back to Euler (who worked them out before there were satisfactory definitions for what he was doing), such as the fact that , which one could state dramatically as

as in this much watched video.

One connection between the function and primes is the Euler formula

The reason the function comes up in number theory is that when one tries to bound how many primes there are between and , after writing an expression for it, and taking a Dirichlet transform, one ends up with terms that blow up if is zero for certain values of .

I will give some intuition for this connection, so that we can see that this is somewhat different, even syntactically, from the result for graphs that is the focus of this post.

Consider the von Mangoldt function defined so that if is a power of the prime and otherwise. (For example, ). Then it is not difficult to prove that

so proving that the number of primes up to is (the *prime number theorem*) is equivalent to proving

And proving that the number of primes up to is is equivalent to proving

After some work one gets

which already suggests that the values for which are a problem, and the integral evaluates to

The get bounds on the sum above, one needs to understand how big is the real part of zeroes of having real part between 0 and 1.

If whenever , this information is already enough to prove that the sum above is , and this is how one proves the prime number theorem. (In fact, the two statements are equivalent.)

The Riemann hypothesis is that if and , then , and this implies that the sum is .

Several generalizations of the zeta functions have been defined over other mathematical objects generalizing the integers and the primes, and a particularly successful direction was the setting of curves over finite fields, where Weil proved the analog of the Riemann hypothesis. In 1949 Weil formulated the *Weil conjectures*, which apply to zeta functions defined over varieties in finite fields. The conjectures were that such zeta functions were rational functions (analogous to the fact that in the complex case the zeta function it is a ratio of holomorphic functions, and a holomorphic function can be seen as an infinitary version of a polynomial; the Euler product formula can also be seen an infinitary version of a rational function), that they satisfied certain symmetries (the *functional equation*), and that they satisfied a, properly defined, Riemann hypothesis.

On the one hand, the Weil conjectures were related to very concrete questions, such as counting solutions to polynomial equations in finite fields, and, via the fact that their resolution was used to prove the Ramanujan conjecture, they are part of the lineage that led to expander constructions. Their eventual proof, however, had a great impact in the development of some of the most abstract parts of contemporary mathematics.

Bernard Dwork (yes, Cynthia Dwork’s father) made the first progress, by proving the conjecture that Weil’s zeta functions are rational functions. Grothendieck took a long view and worked on developing a cohomology theory that would yield the result. (A cohomology is something that can be used to transfer certain results from one mathematical framework to another, especially results involving formulas that count things; here the goal was to develop a cohomology that would get the Weil conjectures for varieties over finite fields by transferring the result from other setting where proofs were already known.) Grothendieck developed étale cohomology (that the autocorrect of my text editor tried to change to “stale cohomology”) toward this goal (now it is one of the foundations of modern algebraic geometry) and proved the rest of the conjectures except the Riemann hypothesis.

Deligne, who was a student of Grothendieck’s, found a way around Grothendieck’s full program, which remains incomplete, and proved the Riemann’s hypothesis.

I refer the interested reader to this awesome post by Terry Tao, which gives a whirlwind tour of zeta functions and Riemann hypotheses all over mathematics, and we now come to expander graphs.

We will call a cycle of length a *prime* cycle if it does not backtrack and it is not derived by going times around a cycle of length . (A prime cycle can use the same edge multiple times, just not twice in a row.)

For a -regular graph , here is its Ihara zeta function:

Note the similarity with the Euler product (2).

Here is a bit of magic; Ihara proved the identity

where is the adjacency matrix of . Don’t panic! Things are getting really simple from this point on.

We know that is an eigenvalue of if and only if

and we see that is a pole of (a point in which the formula for involves a division by zero) when

which is equivalent to

so we have

Fact 1A complex number is a pole for if and only if is an eigenvalue of .

In particular, we see that for every pole of it must be the case that is real, and it is at most in absolute value. Now we can introduce the Riemann hypothesis for Ihara function.

Definition 2The Ihara function of a graph satisfies the Riemann hypothesis if for every pole such that we have .

Recall that a connected -regular graph is Ramanujan if and only if for every eigenvalue of we have that either or .

Now we can prove Ihara’s result.

Theorem 3 (Finally!)A regular connected graph is Ramanujan if and only if its Ihara zeta function satisfies the Riemann hypothesis.

*Proof:* We first do a couple of preliminary calculations. Say that is a pole of , and so is real. If , this means that . If , a short calculation shows that .

Now we have:

*If the Ihara function does not satisfy the Riemann hypothesis, is not Ramanujan*. Let be a pole such that and . So we have an eigenvalue such that . The function , in the interval has a unique minimum at , where it takes the value , and then it attains maxima at 0 and 1, when it takes the value . Now it follows from our constraints on that it must be and so is not Ramanujan.*If the Ihara function satisfies the Riemann hypothesis, is Ramanujan.*Let be an eigenvalue of ; then there must be a pole such that . We consider two cases. If , then we must have either or , and we must also have ; by the previous analysis it follows that , and so . If , then , so clearly .

This survey paper has a lot more information. I took the exposition of Ihara’s proof above from this high school (!!) project. (Note that, in the latter link, the definition of Ramanujan graph is wrong, but the proof is correct.)

I should also note that Murty’s survey attributes the above theorem to Ihara, but this wikipedia page (I know, I know . . . ) attributes only the formula (6) to Ihara, and attributes the above theorem to Sunada. I have not had a chance to look at the primary sources yet.

]]>

I have been writing some notes for myself, and here is something that bothers me: How do you call the second largest, in absolute value, eigenvalue of the adjacency matrix of a graph, without resorting to the sentence I just wrote? And how do you denote it?

I have noticed that the typical answer to the first question is “second eigenvalue,” but this is a problem when it creates confusion with the *actual* second largest eigenvalue of the adjacency matrix, which could be a very different quantity. The answer to the second question seems to be either a noncommittal “” or a rather problematic “.”

For my own use, I have started to used the notation , which can certainly use some improvement, but I am still at a loss concerning terminology.

Perhaps one should start from where this number is coming from, and it seems that its important property is that, if the graph is regular and has vertices, and has adjacency matrix A, this number is the spectral norm of (where is the matrix with ones everywhere), so that it measures the distance of from the “perfect -regular expander” in a norm that is useful to reason about cuts and also tractable to compute.

So, since it is the spectral norm of a modification of the adjacency matrix, how about calling it *adjective spectral norm*? I would vote for *shifted spectral norm* because I would think of subtracting as a sort of shift.

Please, do better in the comments!

]]>

Congratulations to my former colleague Maryam Mirzakhani for being the first Fields Medals winner from Iran, a nation that can certainly use some good news, and a nation that has always done well in identifying and nurturing talent in mathematics and related fields. She is also the first woman to receive this award in 78 years.

And congratulations to Subhash Khot for a very well deserved Nevanlinna prize, and one can read about his work in his own words, in my words, and about the latest impact of his work in the the words of Barak and Steurer.

The Simons foundations has excellent articles up about their work and the work of Artur Avila, Manjul Bhargava, and Martin Hairer, the other Fields Medal recipient. An unusual thing about Manjul Bhargava’s work is that one can actually understand the *statements* of some of his results.

The New York Times has a fascinating article according to which the Fields Medal got its current status because of Steve Smale and cold war paranoia. I don’t know if they are overstating their case, but it is a great story.

]]>

]]>

After the week, the users in the “negative” group posted fewer, and more negative, posts, and those in the “positive” group posted more, and more positive, posts.

Posts were classified according to an algorithm called LIWC2007.

The study run contrary to a conventional wisdom that people find it depressing to see on Facebook good things happening to their friends.

The paper has caused considerable controversy for being a study with human subjects conducted without explicit consent. Every university, including of course Cornell, requires experiments involving people to be approved by a special committee, and participants must sign informed consent forms. Facebook maintains that the study is consistent with its terms of service. The highly respected privacy organization EPIC has filed a complaint with the FTC. (And they have been concerned with Facebook’s term of service for a long time.)

Here I would like to explore a different angle: almost everybody thinks that *observational* studies about human behavior can be done without informed consent. This means that if the Cornell scientists had run an analysis on old Facebook data, with no manipulation of the feed generation algorithm, there would not have been such a concern.

At the same time, the number of posts that are fit for the feed of a typical user vastly exceed what can fit in one screen, and so there are algorithms that pick a rather small subset of posts that are evaluated to be of higher relevance, according to some scoring function. Now suppose that, if N posts fit on the screen, the algorithm picks the 2N highest scoring posts, and then randomly picks half of them. This seems rather reasonable because the scoring function is going to be an approximation of relevance anyway.

The United States has roughly 130 million Facebook subscriber. Suppose that the typical user looks, in a week, at 200 posts, which seems reasonable (in our case, those would be a random subset of roughly 400 posts). According to the PNAS study, roughly 50% of the posts are positive and 25% are negative, so of the initial 400, roughly 200 are positive and 100 are negative. Let’s look at the 100,000 users for which the random sampling picked the fewest positive posts: we would be expecting roughly 3 standard deviations below the mean, so about 80 positive posts instead of the expected 100; the 100,000 users with the fewest negative posts would get about 35 instead of the expected 50.

This is much less variance than in the PNAS study, where they would have got, respectively, only 10 positive and only 5 negative, but it may have been enough to pick up a signal.

Apart from the calculations, which I probably got wrong anyway, what we have is that in the PNAS study they picked a subset of people and then they varied the distribution of posts, while in the second case you pick random posts for everybody and then you select the users with the most variance.

If you could arrange distributions so that *the distributions of posts seen by each users are the same*, would it really be correct to view one study as experimental and one as observational? If the PNAS study had filtered 20% instead of 90% of the positive/negative posts, would it have been ethical? Does it matter what is the *intention* when designing the randomized algorithm that selects posts? If Facebook were to introduce randomness in the scoring algorithm with the goal of later running observational studies would it be ethical? Would they need to let people opt out? I genuinely don’t know the answer to these questions, but I haven’t seen them discussed elsewhere.

]]>

As anybody who has spent time there can confirm, the administrative staff of the Simons Institute is exceptionally good and proactive. Not only they take care of the things you ask them, but they take care of the things that you did not know you should have asked them. In fact at Berkeley the quality of the administration tracks pretty well the level at which it is taking place. At the level of departments and of smaller units, everything usually works pretty well, and then things get worse as you go up.

Which brings me to the office of the Chancellor, which runs U.C. Berkeley, and from which I received my official job offer. As you can see, that office cannot even get right, on its own letterhead, the *name of the university that it runs*:

Also, my address was spelled wrong, and the letter offered me the wrong position. I can’t believe they managed to put on the correct postage stamp. I was then instructed by the EECS department chair to respond by saying “I accept your offer of [correct terms],” which sounded passive-aggressive, but that’s what I did.

]]>