You are currently browsing the category archive for the ‘theory’ category.

[Update 10/24/14: there was a bug in the code I wrote yesterday night, apologies to the colleagues at Rutgers!]

[Update 10/24/14: a reaction to the authoritative study of MIT and the University of Maryland. Also, coincidentally, today Scott Adams comes down against reputation-based rankings]

Saeed Seddighin and MohammadTaghi Hajiaghayi have proposed a ranking methodology for theory groups based on the following desiderata: (1) the ranking should be objective, and based only on quantitative information and (2) the ranking should be transparent, and the methodology openly revealed.

Inspired by their work, I propose an alternative methodology that meets both criteria, but has some additional advantages, including having an easier implementation. Based on the same Brown University dataset, I count, for each theory group, the total number of letters in the name of each faculty member.

Here are the results (apologies for the poor formatting):

1 ( 201 ) Massachusetts Institute of Technology
2 ( 179 ) Georgia Institute of Technology
3 ( 146 ) Rutgers – State University of New Jersey – New Brunswick
4 ( 142 ) University of Illinois at Urbana-Champaign
5 ( 141 ) Princeton University
6 ( 139 ) Duke University
7 ( 128 ) Carnegie Mellon University
8 ( 126 ) University of Texas – Austin
9 ( 115 ) University of Maryland – College Park
10 ( 114 ) Texas A&M University
11 ( 111 ) Northwestern University
12 ( 110 ) Stanford University
13 ( 108 ) Columbia University
14 ( 106 ) University of Wisconsin – Madison
15 ( 105 ) University of Massachusetts – Amherst
16 ( 105 ) University of California – San Diego
17 ( 98 ) University of California – Irvine
18 ( 94 ) New York University
19 ( 94 ) State University of New York – Stony Brook
20 ( 93 ) University of Chicago
21 ( 91 ) Harvard University
22 ( 91 ) Cornell University
23 ( 87 ) University of Southern California
24 ( 87 ) University of Michigan
25 ( 85 ) University of Pennsylvania
26 ( 84 ) University of California – Los Angeles
27 ( 81 ) University of California – Berkeley
28 ( 78 ) Dartmouth College
29 ( 76 ) Purdue University
30 ( 71 ) California Institute of Technology
31 ( 67 ) Ohio State University
32 ( 63 ) Brown University
33 ( 61 ) Yale University
34 ( 54 ) University of Rochester
35 ( 53 ) University of California – Santa Barbara
36 ( 53 ) Johns Hopkins University
37 ( 52 ) University of Minnesota – Twin Cities
38 ( 49 ) Virginia Polytechnic Institute and State University
39 ( 48 ) North Carolina State University
40 ( 47 ) University of Florida
41 ( 45 ) Rensselaer Polytechnic Institute
42 ( 44 ) University of Washington
43 ( 44 ) University of California – Davis
44 ( 44 ) Pennsylvania State University
45 ( 40 ) University of Colorado Boulder
46 ( 38 ) University of Utah
47 ( 36 ) University of North Carolina – Chapel Hill
48 ( 33 ) Boston University
49 ( 31 ) University of Arizona
50 ( 30 ) Rice University
51 ( 14 ) University of Virginia
52 ( 12 ) Arizona State University
53 ( 12 ) University of Pittsburgh

I should acknowledge a couple of limitations of this methodology: (1) the Brown dataset is not current, but I believe that the results would not be substantially different even with current data, (2) it might be reasonable to only count the letters in the last name, or to weigh the letters in the last name by 1 and the letters in the first name by 1/2. If there is sufficient interest, I will post rankings according to these other methodologies.

STOC 2015 will be in Portland (yay!), and part of FCRC (boo!). The call for papers is out.

A conjectural approaches to the Riemann hypothesis is to find a correspondence between the zeroes of the zeta function and the eigenvalues of a linear operator. This was first conceived by Hilbert and by Pólya in the 1910s. (See this correspondence with Odlyzko.) In the subsequent century, there have been a number of rigorous results relating the zeroes of zeta functions in other settings to eigenvalues of linear operators. There is also a connection between the distribution of known zeroes of the Riemann zeta function, as derived from explicit calculations, and the distribution of eigenvalues of the Gaussian unitary ensemble.

In a previous post we talked about the definition of the Ihara zeta function of a graph, and Ihara’s explicit formula for it, in terms of a determinant that is closely related to the characteristic polynomial of the adjacency matrix of the graph, so that the zeroes of the zeta function determine the eigenvalue of the graph. And if the zeta function of the graph satisfies a “Riemann hypothesis,” meaning that, after a change of variable, all zeroes and poles of the zeta function are ${1/\sqrt {d-1}}$ in absolute value, then the graph is Ramanujan. Conversely, if the graph is Ramnujan then its Ihara zeta function must satisfy the Riemann hypothesis.

As I learned from notes by Pete Clark, the zeta functions of curves and varieties in finite fields also involve a determinant, and the “Riemann hypothesis” for such curves is a theorem (proved by Weil for curves and by Deligne for varieties), which says that (after an analogous change of variable) all zeroes and poles of the zeta function must be ${\sqrt q}$ in absolute value, where ${q}$ is the size of the finite field. This means that one way to prove that a family of ${(q+1)}$-regular graphs is Ramanujan is to construct for each graph ${G_n}$ in the family a variety ${V_n}$ over ${{\mathbb F}_q}$ such that the determinant that comes up in the zeta function of ${G_n}$ is the “same” (up to terms that don’t affect the roots, and to the fact that if ${x}$ is a root for one characteristic polynomial, then ${1/x}$ has to be a root for the other) as the determinant that comes up in the zeta function of the variety ${V_n}$. Clark shows that one can constructs such families of varieties (in fact, curves are enough) for all the known algebraic constructions of Ramanujan graphs, and one can use families of varieties for which the corresponding determinant is well understood to give new constructions. For example, for every ${d}$, if ${k}$ is the number of distinct prime divisors of ${d-1}$ (which would be one if ${d-1}$ is a prime power) Clark gets a family of graphs with second eigenvalue ${2^k \sqrt{d-1}}$. (The notes call ${k}$ the number of distinct prime divisors of ${d}$, but it must be a typo.)

So spectral techniques underlie fundamental results in number theory and algebraic geometry, which then give us expander graphs. Sounds like something that we (theoretical computer scientists) should know more about.

The purpose of these notes is to explore a bit more the statements of these results, although we will not even begin to discuss their proofs.

One more reason why I find this material very interesting is that all the several applications of polynomials in computer science (to constructing error-correcting codes, secret sharing schemes, ${k}$-wise independent generators, expanders of polylogarithmic degree, giving the codes used in PCP constructions, self-reducibility of the permanent, proving combinatorial bounds via the polynomial method, and so on), always eventually rely on three things:

• A multivariate polynomial restricted to a line can be seen as a univariate polynomial (and restricting to a ${k}$-dimensional subspace gives a ${k}$-variate polynomial); this means that results about multivariate polynomials can often be proved via a “randomized reduction” to the univariate case;
• A univariate polynomial of degree ${d}$ has at most ${d}$ roots, which follows from unique factorization

• Given ${d}$ desired roots ${a_1,\ldots,a_d}$ we can always find an univariate polynomial of degree ${d}$ which has those roots, by defining it as ${\prod_i (x-a_i)}$.

This is enough to explain the remarkable pseudo-randomness of constructions that we get out of polynomials, and it is the set of principles that underlie much of what is below, except that we are going to restrict polynomials to the set of zeroes of another polynomial, instead of a line, and this is where things get much more complicated, and interesting.

Before getting started, however, I would like to work out a toy example (which is closely related to what goes on with the Ihara zeta function) showing how an expression that looks like the one defining zeta functions can be expressed as a characteristic polynomial of a linear operator, and how its roots (which are then the eigenvalues of the operator) help give bound to a counting problem, and how one can get such bounds directly from the trace of the operator.

If we have quantities ${C_n}$ that we want to bound, then the “zeta function” of the sequence is

$\displaystyle z(T) := \exp\left( \sum_n \frac {C_n}{n} T^n \right)$

For example, and this will be discussed in more detail below, if ${P(\cdot,\cdot)}$ if a bivariate polynomial over ${{\mathbb F}_q}$, we may be interested in the number ${C_1}$ of solutions of the equation ${P(x,y) = 0}$ over ${{\mathbb F}_q}$, as well as the number of solutions ${C_n}$ over the extension ${{\mathbb F}_{q^n}}$. In this case, the zeta function of the curve defined by ${P}$ is precisely

$\displaystyle z(T) = \exp\left( \sum_n \frac {C_n}{n} T^n \right) \ \ \ \ \ (1)$

and Weil proved that ${z(\cdot)}$ is a rational function, and that if ${\alpha_1,\ldots,\alpha_{2g}}$ are the zeroes and poles of ${z()}$, that is, the roots of the numerator and the denominator, then they are all at most ${\sqrt q}$ in absolute value, and one has

$\displaystyle | C_n - q^n | \leq \sum_i |\alpha_i|^n \leq 2g q^{n/2} \ \ \ \ \ (2)$

How does one get an approximate formula for ${C_n}$ as in (2) from the zeroes of a function like (1), and what do determinants and traces have got to do with it?

Here is our toy example: suppose that we have a graph ${G=(V,E)}$ and that ${c_\ell}$ is the number of cycles of the graph of length ${\ell}$. Here by “cycle of length ${\ell}$” we mean a sequence ${v_0,\ldots,v_\ell}$ of vertices such that ${v_0=v_\ell}$ and, for ${i=0,\ldots,\ell-1}$, ${\{ v_i,v_{i-1} \} \in E}$. So, for example, in a graph in which the vertices ${a,b,c}$ form a triangle, ${a,b,c,a}$ and ${a,c,b,a}$ are two distinct cycles of length 3, and ${a,b,a}$ is a cycle of length 2.

We could define the function

$\displaystyle z(T) := \exp\left( \sum_\ell \frac {c_\ell}{\ell} T^\ell \right)$

and hope that its zeroes have to do with the computation of ${c_\ell}$, and that ${z}$ can be defined in terms of the characteristic polynomial.

Indeed, we have (remember, ${T}$ is a complex number, not a matrix):

$\displaystyle z(T) = \frac 1 {\det (I - TA) }$

and, if ${\alpha_1,\ldots,\alpha_n}$ are the zeroes and poles of ${z(T)}$, then

$\displaystyle c_\ell \leq \sum_i |\alpha_i|^{-\ell}$

How did we get these bounds? Well, we cheated because we already knew that ${c_\ell = {\rm Tr}(A^\ell) = \sum_i \lambda_i^\ell}$, where ${\lambda_i}$ are the eigenvalues of ${A}$. This means that

$\displaystyle z(T) = \exp \left( \sum_\ell \sum_i \frac {\lambda_i^\ell}{\ell} T^\ell \right)$

$\displaystyle = \prod_i \exp \left( \sum_\ell \frac {\lambda_i^\ell}{\ell} T^\ell \right)$

$\displaystyle = \prod_i \frac 1 {1- T\lambda_i }$

$\displaystyle = \frac 1 { \det(I - TA) }$

Where we used ${\frac 1 {1-x} = \exp\left( \sum_n \frac { x^n}{n} \right)}$. Now we see that the poles of ${z}$ are precisely the inverses of the eigenvalues of ${A}$.

Now we are ready to look more closely at various zeta functions.

I would like to thank Ken Ribet and Jared Weinstein for the time they spent answering questions. Part of this post will follow, almost word for word, this post of Terry Tao. All the errors and misconceptions below, however, are my own original creation.

After my lectures in the “boot camp” of the spectral graph theory program at the Simons Institute, I promised I would post some references, because I stated all results without attribution.

Here is a a first draft.

If you notice that work that you know of (for example, your work) is misrepresented or absent, please let me know and I will edit the document. (If possible, when you do so, do not compare me to Stalin and cc your message to half a dozen prominent people — true story.)

I am still in shock at the news that Microsoft decided to shut down MSR SVC and fire, literally from one day to the next, almost all the scientific staff.

It is shocking that the company decided that it was a good idea to destroy an investment they had made over several years; I am only familiar with the theory side of MSR SVC, which had great success in nurturing young talent and developing breakthrough ideas and results. The work on privacy led by Cynthia Dwork, for example, informed the thinking on privacy of higher management, and on how one could find new ways to balance European privacy law with the data collection necessary for advertising. This is one of the returns of having academic research in a company: not so that they can implement in C++ their algorithms, but so that they can generate and communicate new ways of thinking about problems. (Cynthia is one of the few people retained by Microsoft, but she has lost all her collaborators.) Microsoft’s loss will be other places’ win as the MSR SVC diaspora settles down elsewhere.

It is also shocking that, instead of planning an orderly shutdown, they simply threw people out overnight, which shows a fundamental indifference to the way the academic job market works (it can take a full year for an academic job offer to materialize).

I am not shocked by the class demonstrated by Omer Reingold; the moral stature of people is best seen in difficult moments. Omer has written a beautiful post about the lab, whose comment section has become a memorial to the lab, with people posting their personal remembrances.

Here at Berkeley and Stanford we will do our best to help, and we will make sure that everybody has some space to work. There will also be some kind of community-wide response, but it will take some time to figure out what we can do. Meanwhile, I urge everybody to reach out to their friends formerly at MSR SVC, make them feel the love of their community, and connect them to opportunities for both short term and long term positions, as they weigh their options.

Suppose that we want to construct a very good family of ${d}$-regular expander graphs. The Alon-Boppana theorem says that the best we can hope for, from the point of view of spectral expansion, is to have ${\lambda_2 \leq 2 \sqrt{d-1}}$, and we would certainly be very happy with a family of graphs in which ${\lambda_2 \leq O(\sqrt d)}$.

Known constructions of expanders produce Cayley graphs (or sometimes Schreier graphs, which is a related notion), because it is easier to analyze the spectra of such graphs. If ${\Gamma}$ is a group with operation ${\cdot}$ and ${a^{-1}}$ is the inverse of element ${a}$, and ${S}$ is a symmetric set of generators, then the Cayley graph ${Cay(\Gamma, S)}$ is the graph whose vertices are the elements of ${\Gamma}$ and whose edges are the pairs ${(a,b)}$ such that ${a\cdot b^{-1} \in S}$.

When the group is Abelian, there is good news and bad news. The good news is that the eigenvectors of the graphs are completely characterized (they are the characters of ${\Gamma}$) and the eigenvalues are given by a nice formula. (See here and here.) The bad news is that constant-degree Cayley graphs of Abelian groups cannot be expanders.

That’s very bad news, but it is still possible to get highly expanding graphs of polylogarithmic degree as Cayley graphs of Abelian groups.

Here we will look at the extreme case of a family of graphs of degree ${d_n = n/2}$, where ${n}$ is the number of vertices. Even with such high degree, the weak version of the Alon-Boppana theorem still implies that we must have ${\sigma_2 \geq \Omega(\sqrt d_n)}$, and so we will be happy if we get a graph in which ${\sigma_2 \leq O(\sqrt d) = O(\sqrt n)}$. Highly expanding graphs of degree ${n/2}$ are interesting because they have some of the properties of random graphs from the ${G_{n,\frac 12}}$ distribution. In turn, graphs from ${G_{n,\frac 12}}$ have all kind of interesting properties with high probabilities, including being essentially the best known Ramsey graphs and having the kind of discrepancy property that gives seedless extractors for two independent sources. Unfortunately, these properties cannot be certified by spectral methods. The graph that we will study today is believed to have such stronger properties, but there is no known promising approach to prove such conjectures, so we will content ourselves with proving strong spectral expansion.

The graph is the Paley graph. If ${p}$ is a prime, ${{\mathbb Z} / p {\mathbb Z}}$ is the group of addition modulo ${p}$, and ${Q}$ is the set of elements of ${{\mathbb Z}/ p{\mathbb Z}}$ of the form ${r^2 \bmod p}$, then the graph is just ${Cay ( {\mathbb Z}/p{\mathbb Z}, Q)}$. That is, the graph has a vertex ${v}$ for each ${v\in \{ 0,\ldots,p-1\}}$, and two vertices ${a,b}$ are adjacent if and only if there is an ${r\in \{ 0,\ldots,p-1\}}$ such that ${a-b \equiv r^2 \pmod p}$.

Today, after a lecture in the spectral graph theory boot camp at the Simons institute, I was asked what the expander mixing lemma is like in graphs that are not regular.

I don’t know if I will have time to return to this tomorrow, so here is a quick answer.

First, for context, the expander mixing lemma in regular graph. Say that ${G=(V,E)}$ is a ${d}$-regular undirected graph, and ${A}$ is its adjacency matrix. Then let the eigenvalues of the normalized matrix ${\frac 1d A}$ be

$\displaystyle 1 = \lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n$

We are interested in graphs for which all the eigenvalues are small in absolute value, except ${\lambda_1}$, that is, if we define

$\displaystyle \sigma_2 := \max\{ |\lambda_2|, |\lambda_3 | , \ldots , | \lambda_n | \}$

we are interested in graphs for which ${\sigma_2}$ is small. The expander mixing lemma is the fact that for every two disjoint sets ${S}$ and ${T}$ of vertices we have

$\displaystyle \left | E(S,V-S) - \frac d n \cdot |S| \cdot |T| \right | \leq \sigma_2 d \sqrt{ |S| \cdot |T|} \ \ \ \ \ (1)$

The inequality (1) says that, if ${\sigma_2}$ is small, then the number of edges between every two large sets of vertices is almost determined just by the size of the sets, and it is equal to the expected number of edges between the two sets in a random ${d}$-regular graph, up to an error term that depends on ${\sigma_2}$.

For the proof, we observe that, if we call ${J}$ the matrix that has ones everywhere, then

$\displaystyle \sigma_2 = \max_{x \perp (1,\ldots , 1) } \frac { |x^T A x|}{d\cdot x^Tx} = \max_{x} \frac { \left|x^T \left( A - \frac dn J \right) x \right|}{d\cdot x^T x} = \max_{x,y} \frac { \left|x^T \left( A - \frac dn J \right) y \right|}{d\cdot ||x|| \cdot ||y||}$

and then we substitute ${x := {\mathbf 1}_S}$ and ${y:= {\mathbf 1}_T}$ in the above expression and do the calculations.

In the case of an irregular undirected graph ${G=(V,E)}$, we are going to consider the normalized adjacency matrix ${M:= D^{-\frac 12} A D^{-\frac 12 }}$, where ${A}$ is the adjacency matrix of ${G}$ and ${D}$ is the diagonal matrix such that ${D_{v,v} = d_v}$, where ${d_v}$ is the degree of ${v}$. As in the regular case, the eigenvalues of the normalized adjacency matrix satisfy

$\displaystyle 1 = \lambda_1 \geq \lambda_2 \geq \cdots \lambda_n \geq -1$

Let us define

$\displaystyle \sigma_2:= \max \{ |\lambda_2| , |\lambda_3| , \ldots, |\lambda_n| \}$

the second largest eigenvalue in absolute value of ${M}$.

We will need two more definitions: for a set of vertices ${S}$, its volume is defined as

$\displaystyle vol(S):= \sum_{v\in S} d_v$

the sum of the degrees and ${\bar d = \frac 1n \sum_v d_v}$ the average degree, so that ${vol(V) = \bar d n }$. Now we have

Lemma 1 (Expander Mixing Lemma) For every two disjoint sets of vertices ${S}$, ${T}$, we have

$\displaystyle \left | E(S,V-S) - \frac {vol(S) \cdot vol(T) }{vol(V)} \right | \leq \sigma_2 \sqrt{vol(S) \cdot vol(T) } \ \ \ \ \ (2)$

So, once again, we have that the number of edges between ${S}$ and ${T}$ is what one would expect in a random graph in which the edge ${(u,v)}$ exists with probability ${d_ud_v/vol(V)}$, up to an error that depends on ${\sigma_2}$.

The spectral norm of the infinite ${d}$-regular tree is ${2 \sqrt {d-1}}$. We will see what this means and how to prove it.

When talking about the expansion of random graphs, abobut the construction of Ramanujan expanders, as well as about sparsifiers, community detection, and several other problems, the number ${2 \sqrt{d-1}}$ comes up often, where ${d}$ is the degree of the graph, for reasons that tend to be related to properties of the infinite ${d}$-regular tree.

In preparation for the special program on spectral graph theory at the Simons Institute, which starts in a week, I have been reading on the topics of the theory that I don’t know much about: the spectrum of random graphs, properties of highly expanding graphs, spectral sparsification, and so on.

I have been writing some notes for myself, and here is something that bothers me: How do you call the second largest, in absolute value, eigenvalue of the adjacency matrix of a graph, without resorting to the sentence I just wrote? And how do you denote it?

I have noticed that the typical answer to the first question is “second eigenvalue,” but this is a problem when it creates confusion with the actual second largest eigenvalue of the adjacency matrix, which could be a very different quantity. The answer to the second question seems to be either a noncommittal “${\lambda}$” or a rather problematic “${\lambda_2}$.”

For my own use, I have started to used the notation ${\lambda_{2abs}}$, which can certainly use some improvement, but I am still at a loss concerning terminology.

Perhaps one should start from where this number is coming from, and it seems that its important property is that, if the graph is ${d}$ regular and has ${n}$ vertices, and has adjacency matrix A, this number is the spectral norm of ${A - \frac dn J}$ (where ${J}$ is the matrix with ones everywhere), so that it measures the distance of ${A}$ from the “perfect ${d}$-regular expander” in a norm that is useful to reason about cuts and also tractable to compute.

So, since it is the spectral norm of a modification of the adjacency matrix, how about calling it ${<}$adjective${>}$ spectral norm? I would vote for shifted spectral norm because I would think of subtracting ${\frac dn J}$ as a sort of shift.