Terry Tao at the Colbert Report

# Tag Archives: Terence Tao

# Holiday Readings

Now that the Winter break is coming, what to read in between decorating, cooking, eating, drinking and being merry?

The most exciting theoretical computer science development of the year was the improved efficiency of matrix multiplication by Stother and Vassilevska-Williams. Virginia’s 72-page write-up will certainly keep many people occupied.

Terence Tao is teaching a class on expanders, and posting the lecture notes, of exceptional high quality. It is hard to imagine something that would a more awesome combination, to me, than Terry Tao writing about expanders. Maybe a biopic on Turing’s life, starring Joseph Gordon-Levitt, written by Dustin Lance Black and directed by Clint Eastwood.

Meanwhile, Ryan O’Donnell has been writing a book on analysis of boolean function, by “serializing” it on a blog platform.

Now that I have done my part, you do yours. If I wanted to read a couple of books (no math, no theory) during the winter, what would you recommend. Don’t recommend what you think I would like, recommend what you like best, and why.

# The Inverse Conjecture for the Gowers Norms

This semester, the MSRI is having a special semester devoted to additive combinatorics and ergodic theory.

Additive combinatorics is the branch of extremal combinatorics where the objects of interest are sets of integers and, more generally, subsets of abelian groups (rather than graphs or set systems) and where the properties of interest are formulated in terms of linear equations (rather than in terms of cuts, partitions, intersections, and so on). Lately, it has been quite fascinating for a convergence of methods from “classical” combinatorics, from analysis and from ergodic theory. I have often written about it here because the combinatorial and analytical techniques of additive combinatorics have been useful in a number of different computer science applications (related to probabilistically checkable proof constructions, property testing, and pseudorandomness), and computer scientists are becoming more and more interested in the subject, contributing their own perspective to it.

In all this, the exact role of ergodic theory (and the possible applications of ergodic-theoretic techniques in complexity theory) has been a bit of mystery to me, and perhaps to other computer scientists too. Very nicely, the MSRI special program started this week with a series of tutorials to introduce the connections between ergodic theory and additive combinatorics.

All talks are (or will soon be) online, and the talks by Terry Tao are especially recommended, because he explains, using very simple examples, how one goes about converting concrete, finitary and quantitative statements into equivalent abstract infinitary statements, which are in turn amenable to ergodic-theoretic techniques.

Today, Tamar Ziegler discussed the very recent proof, by Vitaly Bergelson, Terry Tao, and herself, of the inverse conjecture for Gowers norms in finite fields, a proof that uses ergodic-theoretic techniques.

But, you may object, didn’t we know that the inverse conjecture for Gowers norms is false? Well, the counterexample of Lovett, Meshulam, and Samorodnitsky (independently discovered by Green and Tao) refutes the following conjecture: “Suppose is a bounded function such that ; then there is a independent of and an -variate degree-(k-1) polynomial over such that and have correlation at least .”

This is refuted in the case by taking , where is the symmetric polynomial of degree 4. While has (indeed, exponentially small) correlation with all functions of the form , where is a degree-3 polynomial, the norm is a constant.

The statement proved by Bergelson, Tao, and Ziegler is “Suppose is a bounded function such that ; then there is a independent of and a bounded -variate degree-(k-1) ‘polynomial function’ such that and have correlation at least .”

What is, then, a ‘polynomial function’ of degree ? It is a function bounded by 1 in absolute value, and such that for every directions , if one takes Gowers derivatives in such directions one always gets the constant-1 function. In other words, is a ‘polynomial function’ of degree if for every , and one has . Interestingly, these functions are a proper superclass of the functions of the form with being a polynomial over .

In the concrete case, one may construct such a function by letting be a polynomial in the ring, say, , and then having , where is a primitive eight-root of unity. Indeed, this is the type of degree-3 polynomial function that is correlated with .

[Apologies for not defining all the technical terms and the context; the reader can find some background in this post and following the links there.]

What is, then, ergodic theory, and what does it have to do with finitary combinatorial problems? I am certainly the wrong person to ask, but I shall try to explain the little that I have understood in the next post(s).

# Dense Subsets of Pseudorandom Sets: The Paper(s)

Green, Tao and Ziegler, in their works on patterns in the primes, prove a general result of the following form: if is a set, is a, possibly very sparse, “pseudorandom” susbset of , and is a dense subset of , then may be “modeled” by a large set which has the same density in as the density of in .

They use this result with being the integers in a large interval , being the “almost-primes” in (integers with no small factor), and being the primes in . Since the almost-primes can be proved to be “pseudorandom” in a fairly strong sense, and since the density of the primes in the almost-primes is at least an absolute constant, it follows that the primes are “indistinguishable” from a large set containing a constant fraction of all integers. Since such large sets are known to contain arbitrarily long arithmetic progressions, as proved by Szemeredi, Green and Tao are able to prove that the primes too must contain arbitrarily long arithmetic progressions. Such large sets are also known to contain arbitrarily long “polynomial progressions,” as proved by Bergelson and Leibman, and this allows Tao and Ziegler to argue that the primes too much contain arbitrarily long polynomial progressions.

(The above account is not completely accurate, but it is not lying too much.)

As announced last October here, and here, Omer Reingold, Madhur Tulsiani, Salil Vadhan and I found a new proof of this “dense model” theorem, which uses the min-max theorem of game theory (or, depending on the language that you prefer to use, the duality of linear programming or the Hahn Banach theorem) and was inspired by Nisan’s proof of the Impagliazzo hard-core set theorem. In complexity-theoretic applications of the theorem, our reduction has polynomial complexity, while the previous work incurred an exponential loss.

As discussed here and here, we also show how to use the Green-Tao-Ziegler techniques of “iterative partitioning” to give a different proof of Impagliazzo’s theorem.

After long procrastination, we recently wrote up a paper about these results.

In the Fall, we received some feedback from additive combinatorialists that while our proof of the Green-Tao-Ziegler result was technically simpler than the original one, the language we used was hard to follow. (That’s easy to believe, because it took us a while to understand the language in which the original proof was written.) We then wrote an expository note of the proof in the analyst’s language. When we were about to release the paper and the note, we were contacted by Tim Gowers, who, last Summer, had independently discovered a proof of the Green-Tao-Ziegler results via the Hahn-Banach theorem, essentially with the same argument. (He also found other applications of the technique in additive combinatorics. The issue of polynomial complexity, which does not arise in his applications, is not considered.)

Gowers announced his results in April at a talk at the Fields institute in Toronto. (Audio and slides are available online.)

Gowers’ paper already contains the proof presented in the “analyst language,” making our expository note not so useful any more; we have still posted it anyways because, by explaining how one translates from one notation to the other, it can be a short starting point for the computer scientist who is interested in trying to read Gowers’ paper, or for the combinatorialist who is interested in trying to read our paper.

# The "Complexity Theory" Proof of a Theorem of Green-Tao-Ziegler

We want to prove that a dense subset of a pseudorandom set is indistinguishable from a truly dense set.

Here is an example of what this implies: take a pseudorandom generator of output length , choose in an arbitrary way a 1% fraction of the possible seeds of the generator, and run the generator on a random seed from this restricted set; then the output of the generator is indistinguishable from being a random element of a set of size .

(Technically, the theorem states the existence of a distribution of min-entropy , but one can also get the above statement by standard “rounding” techniques.)

As a slightly more general example, if you have a generator mapping a length- seed into an output of length , and is a distribution of seeds of min-entropy at least , then is indistinguishable from a distribution of min-entropy . (This, however, works only if .)

It’s time to give a formal statement. Recall that we say that a distribution is -dense in a distribution if

(Of course I should say “random variable” instead of “distribution,” or write things differently, but we are between friends here.)

We want to say that if is a class of tests, is pseudorandom according to a moderately larger class , and is -dense in , then there is a distribution that is indistinguishable from according to and that is -dense in the uniform distribution.

The Green-Tao-Ziegler proof of this result becomes slightly easier in our setting of interest (where contains boolean functions) and gives the following statement:

Theorem (Green-Tao-Ziegler, Boolean Case)

Let be a finite set, be a class of functions , be a distribution over , be a -dense distribution in , be given.Suppose that for every that is -dense in there is an such that

Then there is a function of the form where and such that

Readers should take a moment to convince themselves that the above statement is indeed saying that if is pseudorandom then has a model , by equivalently saying that if no model exists then is not pseudorandom.

The problem with the above statement is that can be arbitrary and, in particular, it can have circuit complexity exponential in , and hence in .

In our proof, instead, is a linear threshold function, realizable by a size circuit. Another improvement is that .

Here is the proof by Omer Reingold, Madhur Tulsiani, Salil Vadhan, and me. Assume is closed under complement (otherwise work with the closure of ), then the assumption of the theorem can be restated without absolute values

for every that is -dense in there is an such that

We begin by finding a “universal distinguisher.”

Claim

There is a function which is a convex combination of functions from and such that that for every that is -dense in ,

This can be proved via the min-max theorem for two-players games, or, equivalently, via linearity of linear programming, or, like an analyst would say, via the Hahn-Banach theorem.

Let now be the set of elements of where is largest. We must have

(1)

which implies that there must be a threshold such that

(2)

So we have found a boolean distinguisher between and . Next,

we claim that the same distinguisher works between and .

By the density assumption, we have

and since contains exactly a fraction of , and since the condition always fails outside of (why?), we then have

and so

(3)

Now, it’s not clear what the complexity of is: it could be a convex combination involving *all* the functions in . However, by Chernoff bounds, there must be functions with such that is well approximated by for all $x$ but for an exceptional set having density less that, say, , according to both and .

Now and are distinguished by the predicate , which is just a linear threshold function applied to a small set of functions from , as promised.

Actually I have skipped an important step: outside of the exceptional set, is going to be *close* to but not identical, and this could lead to problems. For example, in (3) might typically be larger than only by a tiny amount, and might consistently underestimate in . If so, could be a completely different quantity from .

To remedy this problem, we note that, from (1), we can also derive the more “robust” distinguishing statement

(2′)

from which we get

(3′)

And now we can be confident that even replacing with an approximation we still get a distinguisher.

The statement needed in number-theoretic applications is stronger in a couple of ways. One is that we would like to contain bounded functions rather than boolean-valued functions. Looking back at our proof, this makes no difference. The other is that we would like to be a function of the form rather than a general composition of functions . This we can achieve by approximating a threshold function by a polynomial of degree using the Weierstrass theorem, and then choose the most distinguishing monomial. This gives a proof of the following statement, which is equivalent to Theorem 7.1 in the Tao-Ziegler paper.

Theorem (Green-Tao-Ziegler, General Case)

Let be a finite set, be a class of functions , be a distribution over , be a -dense distribution in , be given.Suppose that for every that is -dense in there is an such that

Then there is a function of the form where and such that

In this case, we too lose an exponential factor. Our proof, however, has some interest even in the number-theoretic setting because it is somewhat simpler than and genuinely different from the original one.

# Dense Subsets of Pseudorandom Sets

The Green-Tao theorem states that the primes contain arbitrarily long arithmetic progressions; its proof can be, somewhat inaccurately, broken up into the following two steps:

Thm1: Every constant-density subset of a pseudorandom set of integers contains arbitrarily long arithmetic progressions.

Thm2: The primes have constant density inside a pseudorandom set.

Of those, the main contribution of the paper is the first theorem, a “relative” version of Szemeredi’s theorem. In turn, its proof can be (even more inaccurately) broken up as

Thm 1.1: For every constant density subset D of a pseudorandom set there is a “model” set M that has constant density among the integers and is indistinguishable from D.

Thm 1.2 (Szemeredi) Every constant density subset of the integers contains arbitrarily long arithmetic progressions, and many of them.

Thm 1.3 A set with many long arithmetic progressions cannot be indistinguishable from a set with none.

Following this scheme is, of course, easier said than done. One wants to work with a definition of pseudorandomness that is weak enough that (2) is provable, but strong enough that the notion of indistinguishability implied by (1.1) is in turn strong enough that (1.3) holds. From now on I will focus on (1.1), which is a key step in the proof, though not the hardest.

Recently, Tao and Ziegler proved that the primes contain arbitrarily long “polynomial progressions” (progressions where the increments are given by polynomials rather than linear functions, as in the case of arithmetic progressions). Their paper contains a very clean formulation of (1.1), which I will now (accurately, this time) describe. (It is Theorem 7.1 in the paper. The language I use below is very different but equivalent.)

We fix a finite universe ; this could be in complexity-theoretic applications or in number-theoretic applications. Instead of working with subsets of , it will be more convenient to refer to probability distributions over ; if is a set, then is the uniform distribution over . We also fix a family of “easy” function . In a complexity-theoretic applications, this could be the set of boolean functions computed by circuits of bounded size. We think of two distributions as being -indistinguishable according to if for every function we have

and we think of a distribution as pseudorandom if it is indistinguishable from the uniform distribution . (This is all standard in cryptography and complexity theory.)

Now let’s define the natural analog of “dense subset” for distributions. We say that a distribution is -dense in if for every we have

Note that if and for some sets , then is -dense in if and only if and .

So we want to prove the following:

Theorem (Green, Tao, Ziegler)

Fix a family of tests and an ; then there is a “slightly larger” family and an such that if is an -pseudorandom distribution according to and is -dense in , then there is a distribution that is -dense in and that is -indistinguishable from according to .

[The reader may want to go back to (1.1) and check that this is a meaningful formalization of it, up to working with arbitrary distributions rather than sets. This is in fact the “inaccuracy” that I referred to above.]

In a complexity-theoretic setting, we would like to say that if is defined as all functions computable by circuits of size at most , then should be and should contain only functions computable by circuits of size . Unfortunately, if one follows the proof and makes some simplifications asuming contains only boolean functions, one sees that contains functions of the form , where , , and could be arbitrary and, in general, have circuit complexity exponential in and . Alternatively one may approximate as a low-degree polynomial and take the “most distinguishing monomial.” This will give a version of the Theorem (which leads to the actual statement of Thm 7.1 in the Tao-Ziegler paper) where contains only functions of the form , but then will be exponentially small in and . This means that one cannot apply the theorem to “cryptographically strong” notions of pseudorandomness and indistinguishability, and in general to any setting where and are super-logarithmic (not to mention super-linear).

This seems like an unavoidable consequence of the “finitary ergodic theoretic” technique of iterative partitioning and energy increment used in the proof, which always yields at least a singly exponential complexity.

Omer Reingold, Madhur Tulsiani, Salil Vadhan and I have recently come up with a different proof where both and the complexity of are polynomial. This gives, for example, a new characterization of the notion of pseudoentropy. Our proof is quite in the spirit of Nisan’s proof of Impagliazzo’s hard-core set theorem, and it is relatively simple. We can also deduce a version of the theorem where, as in Green-Tao-Ziegler, contains only bounded products of functions in . In doing so, however, we too incur an exponential loss, but the proof is somewhat simpler and demonstrates the applicability of complexity-theoretic techniques in arithmetic combinatorics.

Since we can use (ideas from) a proof of the hard core set theorem to prove the Green-Tao-Ziegler result, one may wonder whether one can use the “finitary ergodic theory” techniques of iterative partitioning and energy increment to prove the hard-core set theorem. Indeed, we do this too. In our proof, the reduction loses a factor that is exponential in certain parameters (while other proofs are polynomial), but one also gets a more “constructive” result.

If readers can stomach it, a forthcoming post will describe the complexity-theory-style proof of the Green-Tao-Ziegler result as well as the ergodic-theory-style proof of the Impagliazzo hard core set theorem.

# Best Tutorials Ever

FOCS 2007 started yesterday in Providence with a series of tutorials.

Terry Tao gave a talk similar to the one he gave in Madrid, discussing the duality between pseudorandomness and efficiency which is a way to give a unified view of techniques coming from analysis, combinatorics and ergodic theory.

In typical such results, one has a set $F$ of “simple” functions (for example linear, or low-degree polynomials, or, in conceivable complexity-theoretic applications, functions of low circuit complexity) and one wants to write an arbitrary function $g$ as

$ g(x) = g_{pr} (x) + g_{str} (x) + g_{err} (x) $

where $g_{pr}$ is *pseudorandom* with respect to the “distinguishers” in $F$, $g_{str}$ is a “simple combination” of functions from $\cal F$, and $g_{err}$ accounts for a possible small approximation error. There are a number of ways to instantiate this general template, as can be seen on the accompanying notes, and it is nice to see how even the Szemeredi regularity lemma can be fit into this template. (The “functions” are adjacency matrices of graphs, and the “efficient” functions are complete bipartite subgraphs.)

Dan Boneh spoke on pairing-based cryptography, an idea that has grown into a whole, rich, area, with specialized conferences and, according to Google Scholar, 1,200+ papers published so far. In this setting one has a group $G$ (for example points on an elliptic curve) such that there is a mapping $e: G X G \rightarrow G_T$ that takes pairs of elements of $G$ into an element of another group $G_T$ satisfying a bilinearity condition. (Such a mapping is a “pairing,” hence the name of the area.) Although such mapping can lead to attacks on the discrete log problem in $G$, if the mapping is chosen carefully one may still assume intractability of discrete log in $G$, and the pairing can be very useful in constructing cryptographic protocols and proving their security. In particular, one can get “identity-based encryption,” a type of public key cryptography where a user’s public key can be her own name (or email address, or any deterministically chosen name), which in turn can be used as a primitive in other applications.

Dan Spielman spoke on spectral graph theory, focusing on results and problems that aren’t quite studied enough by theoreticians. He showed some remarkable of example of graph drawings obtained by simply plotting a vertex $i$ to the point $(v(i),w(i))$, where $v$ and $w$ are the second largest and third largest eigenvalues of the laplacian of the adjacency matrix. The sparse cut promised by Cheeger inequality is, in such a drawing, just the cut given by a vertical line across the drawing, and there are nice algebraic explanations for why the drawing looks intuitively “nice” for many graphs but not for all. Spectral partitioning has been very successful for image segmentation problems, but it has some drawbacks and it would be nice to find theoretically justified algorithms that would do better.

Typically, I don’t go to an Italian restaurant in the US unless I have been there before and liked it, a rule that runs into certain circularity problems. I was happy that yesterday I made an exception to go to Al Forno, which proved to be truly exceptional.

# The Mozart of Math

# What Is Good Mathematics?

A very interesting “opinion piece” by Terry Tao. I liked the list of five “hypothetical” ways in which a mathematical field can lose its ways. And the story of Szemeredi’s theorem, used as a case study of good mathematics, is so beautiful that it’s always good to hear it told.

# Pseudorandomness and more pseudorandomness

The first talk of the morning is by Terry Tao. Tim Gowers, in his essay The Two Cultures of Mathematics explains the difference between depth and generality in combinatorics versus other areas of pure math. In combinatorics, one is unlikely to find deep definitions (have you ever tried to understand the *statements* of the conjectures in the Langlands program?) and very general theorems. In combinatorics, it is more usual to find generality in the form of *principles* that the practictioners of the subject are aware of and that are exemplified in the proofs of certain theorems. (I hope I have not distorted and oversimplified Gowers’s point too much.)

Tao’s talk very explicitly points out the principle that emerges from his work on arithmetic progressions. The idea is that one can often define notions of *pseudorandomness* and *structure* such that one has a theorem of the following kind:

- Dichotomy: if an object is not pseudorandom, then it has large stuctured component.

For example, in the setting of looking for arithmetic progressions of length 3, a subset of {1,…,N} is “structured” if it has a large Fourier coefficient and it is pseudorandom if it has approximately the same number of length-3 progressions of a random subset of {1,…,N} with the same density.

By iterative arguments, one then has theorems of the form

- Decomposition: every object can be written as a “sum” of a pseudorandom part and a structured part

This is very important in the proof of the Green-Tao theorem on arithmetic progressions in the primes, but it can be seen, implicitly, in the Szemeredi regularity Lemma and in all proofs of Szemeredi’s theorem.

The proof typically goes by: if our object is pseudorandom, we are done; otherwise we find a large structured “piece.” We “subtract” it. If what remains is pseudorandom, we are fine, otherwise we subtract another structured part, and so on.

The advantage of this type of decomposition (whether it is done explicitely or it is implicit in the proof) is that one can use techniques from analysis and probability to analyse the pseudorandom part and (depending on the definition of “structure”) techniques from analysis and geometry to analyse the structured part. This is how the proof of the Green-Tao theorem ends up using analysis, ergodic theory, combinatorics, and so on.

Often, one can use geometry and algebra only provided that the object is “exactly” structured, while results of the above type may only give “approximate” structure. To bridge this gap one employs results of the type:

- Rigidity: an object that is “approximately structured” is “close” to an object that is structured

These are results in the spirit of property testing. (And he mentions property testing in his talk.)

This is all quite abstract, but reading papers in the area (or approaching new problems) with this point of view in mind is very helpful. He also proceeds to summarize the proof of the Green-Tao theorem in terms of the above perspective.

By the way, earlier in the talk, about the general importance of understanding pseudorandomness, Tao talks mentions expander graphs and the P versus BPP question. (As a context, not as specific problems that he is going to work on. For the time being, we are safe.)

Overall, the talk is a model of clarity and insight.

After the break, Avi Wigderson gives his talk on *P, NP and Mathematics*. Yesterday, Avi told me that, of all the material that he cover in his 50-page proceedings paper, he had decided to talk about pseudorandomness and derandomization, from the perspective that randomness seems to help in computation, but complexity theory suggests that it does not. I despaired, because this is exactly my talk, but luckily Avi’s talk has a somewhate complementary emphasis on topics. (So I won’t have to spend the next two days remaking my slides and I can go see the Prado tomorrow.)

Avi explains the “hardness versus randomness” principle in terms that could not be clearer, and one can see heads nodding as he goes along (nodding in agreement, not nodding off). He ends up with the following question about the complexity of the permanent: is there a linear transformation L of nXn matrices into kXk matrices such that, (i) for every matrix M, the determinant of L(M) equals the permanent of M and (ii) k is only polynomial in n? (The answer is supposed to be NO.) From known results one can derive that k need not be more than exp(O(n)) and a Omega(n^{2}) lower bound is also known.

This is to say that one can work on complexity theory without having to learn about such things as algorithms, Turing machines and so on, and that purely algebraic questions are completely open and very related to complexity questions.