Replacing madness by mere disappointment

Tim Gowers is going to write about his idea for circuit lower bounds in the form of a seven-part dialog between three characters.

The first installment is out, and it already features a gentle and beautiful introduction to the models of boolean circuits and boolean formulas, and to the natural proof impossibility result. He also brings up the very interesting meta-question (which sounds rather new to me) of whether a “random easy function is pseudorandom.”

(The title of the post is how a character in the dialog refers to impossibility results such as natural proofs.)

Dense Subsets of Pseudorandom Sets: The Paper(s)

Green, Tao and Ziegler, in their works on patterns in the primes, prove a general result of the following form: if X is a set, R is a, possibly very sparse, “pseudorandom” susbset of X, and D is a dense subset of R, then D may be “modeled” by a large set M which has the same density in X as the density of D in R.

They use this result with X being the integers in a large interval \{ 1,\ldots, N\}, R being the “almost-primes” in X (integers with no small factor), and D being the primes in X. Since the almost-primes can be proved to be “pseudorandom” in a fairly strong sense, and since the density of the primes in the almost-primes is at least an absolute constant, it follows that the primes are “indistinguishable” from a large set M containing a constant fraction of all integers. Since such large sets are known to contain arbitrarily long arithmetic progressions, as proved by Szemeredi, Green and Tao are able to prove that the primes too must contain arbitrarily long arithmetic progressions. Such large sets are also known to contain arbitrarily long “polynomial progressions,” as proved by Bergelson and Leibman, and this allows Tao and Ziegler to argue that the primes too much contain arbitrarily long polynomial progressions.

(The above account is not completely accurate, but it is not lying too much.)

As announced last October here, and here, Omer Reingold, Madhur Tulsiani, Salil Vadhan and I found a new proof of this “dense model” theorem, which uses the min-max theorem of game theory (or, depending on the language that you prefer to use, the duality of linear programming or the Hahn Banach theorem) and was inspired by Nisan’s proof of the Impagliazzo hard-core set theorem. In complexity-theoretic applications of the theorem, our reduction has polynomial complexity, while the previous work incurred an exponential loss.

As discussed here and here, we also show how to use the Green-Tao-Ziegler techniques of “iterative partitioning” to give a different proof of Impagliazzo’s theorem.

After long procrastination, we recently wrote up a paper about these results.

In the Fall, we received some feedback from additive combinatorialists that while our proof of the Green-Tao-Ziegler result was technically simpler than the original one, the language we used was hard to follow. (That’s easy to believe, because it took us a while to understand the language in which the original proof was written.) We then wrote an expository note of the proof in the analyst’s language. When we were about to release the paper and the note, we were contacted by Tim Gowers, who, last Summer, had independently discovered a proof of the Green-Tao-Ziegler results via the Hahn-Banach theorem, essentially with the same argument. (He also found other applications of the technique in additive combinatorics. The issue of polynomial complexity, which does not arise in his applications, is not considered.)

Gowers announced his results in April at a talk at the Fields institute in Toronto. (Audio and slides are available online.)

Gowers’ paper already contains the proof presented in the “analyst language,” making our expository note not so useful any more; we have still posted it anyways because, by explaining how one translates from one notation to the other, it can be a short starting point for the computer scientist who is interested in trying to read Gowers’ paper, or for the combinatorialist who is interested in trying to read our paper.

The unreasonable effectiveness of additive combinatorics in computer science

As I have written several times on these pages, techniques from additive combinatorics seem to be very well suited to attack problems in computer science, and already a good amount of applications have been found. For example, “sum-product theorems” originally developed in a combinatorial approach to the Kakeya problem have been extremely valuable in recent constructions of randomness extractors. The central theorem of additive combinatorics, Szemeredi’s theorem, has now four quite different proofs, one based on graph theory and Ramsey theory, one based on analytical methods, one based on ergodic theory and one based on hypergraph theory. The first proof introduced the Szemeredi regularity lemma, which is a fixture of algorithmic work on property testing. The analytical proof of Gowers introduced the notion of Gowers uniformity that, so far, has found application in PCP constructions, communication complexity , and pseudorandomness. There is also work in progress on complexity-theoretic applications of some of the ergodic-theoretic techniques.

Why is it the case that techniques developed to study the presence of arithmetic progressions in certain sets are so useful to study such unrelated notions as sub-linear time algorithms, PCP systems, pseudorandom generators, and multi-party protocols? This remains, in part, a mystery. A unifying theme in the recent advances in additive combinatorics is the notion that every large combinatorial object can be “decomposed” into a “pseudorandom” part and a “small-description” part, and that many questions that we might be interested in are easy to answer, at least approximately, on pseudorandom and on small-description objects. Since computer scientists almost always deal with worst-case scenario, and are typically comfortable with approximations, it is reasonable that we can take advantage of techniques that reduce the analysis of arbitrary worst cases to the analysis of much simpler scenarios.

Whatever the reason for their effectiveness, it is worthwhile for any theoretical computer scientist to learn more about this fascinating area of math. One of the tutorials in FOCS 2007 will be on additive combinatorics, with a celebrity speaker. More modestly, following Random-Approx 2007, in Princeton, there will be a course on additive combinatorics for (and by) computer scientists. (If you want to go, you have to register by August 1 and reserve the hotel by this weekend.)

Analytical approaches to Szemeredi’s Theorem: general case

After having discussed the notion of Gowers uniformity, the statement of Szemeredi’s theorem, and the proof for progressions of length 3, let me try to say something about the case of progressions of length 4 or more, and the several related open questions.

As already discussed, Gowers’s proof is based on an “algorithm” that, given a subset A of ZN of size dN, does one of the following two things:

  1. It immediately finds a progression of length k in A; or
  2. it constructs a subset A’ of ZN’ such that: (i) if there is a progression of length k in A’, then there is also such a progression in A; (ii) A’ has size at least (d+poly(d))*N’; N’ is at least a constant root of N.

As in the proof for the case of progressions of length 3, once we have such an algorithm we are able to find a progression of length k in A, provided that, initially, N is at least exp(exp(poly(1/d))).

How does the “algorithm” work? Let us, again, identify the set A with its characteristic function A: ZN -> {0,1}, and consider the expression

(*) Ex,y A(x)*A(x+y)*A(x+2y)*…*A(x+(k-1)*y)

It is possible to show that (*) equals dk (which is what we would expect if A were a “pseudorandom” set) plus or minus an error term that depends on Uk-1(A-d), where Uk-1 is the dimension-(k-1) Gowers uniformity as defined earlier. A result of this kind (relating Gowers uniformity and number of arithmetic progressions) is called a “generalized Von Neumann theorem” in the literature, for reasons that I unfortunately ignore. The proof is not difficult at all, it is just a series of applications of Cauchy-Schwartz. The genius is in the choice of the right definition. (And the technical difficulties are in what comes next.)

This means that if the dimension-(k-1) Gowers uniformity of the function f(x):=A(x)-d is small, then A contains a lot of length-k arithmetic progressions, and we are in case (1)

What remains to prove is that if the set A is such that f(x):=A(x)-d has noticeably large dimension-(k-1) Gowers uniformity then we can find a set A’ in ZN’ as in case (2) above. Gowers’s paper is approximately 130 pages long, and about 100 pages are devoted to proving just that.

Obviously, I have no idea what goes on in those 100 pages, but (based on later work by Green and Tao) I can guess how it would be if we were trying to prove Szemeredi’s theorem not in ZN but in Fnp with p prime and larger than k. Translated to such a setting (I think), Gowers’s argument would proceed by showing that if Uk(A-d) is at least eps, then there is an affince subspace V of Fnp of dimension n/O(1) such that A restricted to V has density at least d+poly(eps). This, in turn, would be proved by showing

  1. If f has noticeably large dimension-(k-1) Gowers uniformity, then there is an affine subspace V of dimension n/O(1) and a polynomial phase function g of degree (k-2) such that, restricted to V, f and g are noticeably correlated
  2. If f is a function that has noticeable correlation with a low degree polynomial phase function, then there is a subspace of dimension n/O(1) such that f is correlated to a linear function when restricted to that subspace.

Let’s make things simple by restricting ourselves to Boolean function (even though this p=2 case has no direct application to results about arithmetic progressions).

Then, part (1) is saying that if f:{0,1}n->{-1,1} has noticeably large dimension-k Gowers uniformity, then there is a polynomial p() of degree (k-1) over Z2, and an affine subspace V of dimesion n/O(1), such that f(x) and (-1)p(x) agree on noticeably more than 1/2 of the inputs x. Part (2) is saying that if f:{0,1}n->{-1,1} and (-1)p(x) agree on noticeably more than 1/2 of the inputs, where p is a low-degree polynomial, then there is an affine subspace V of dimension n/O(1) such that, restricted to V, f has agreement with a linear function on noticeably more than 1/2 of the inputs.

The two results together imply that if f has noticeably large Gowers uniformity then there is a large sub-space in which f is correlated with a linear function. (To reach the conclusion, you need to apply part (1), then do a change of variables so that now the V of part (1) looks like {0,1}n/O(1), and then apply part(2).)

Actually, what I just wrote may be an open question, but it should be provable along the lines of Gowers’s proof, and it should be much easier to prove.

For the case of progressions of length 4 in Fnp, for small prime p>4, Green and Tao prove that if f is a bounded function of noticeably large dimension-3 Gowers uniformity, then f is correlated to a degree-2 polynomial over all of Fnp, and the correlation is especially good on a subspace of dimension n-O(1). (As opposed to n/O(1).) This improvement, and several additional ideas, lead to an improved quantitative version of Szemeredi’s theorem for progressions of length 4 in Fnp, and they also announced a similar improved result for progression of length 4 over the integers.

Their proof relies on the analysis of a certain “linearity test in the highly noisy case,” which is also used (and, I think, proved for the first time) in Gowers’s paper. In the, simpler, Boolean, case, the result is

Let F:{0,1}n -> {0,1}m, and suppose that

Prx,y[F(x)+F(y) = F(x+y)] > eps

where all operations are bitwise mod 2. Then there is an eps’ (depending only on eps, but independent of n,m) and a matrix M such that the functions x->F(x) and x->Mx have agreement at least eps’

(Update: as Alex explains in his comment below, the above sentence is quite misleading. Gowers proves quite a different statement in ZN, a setting in which an analogous “highly noise linearity test” statement is provably false, and where it is difficult to even get the statement of the right analog. The Boolean statement above is proved for the first time in Alex’s paper, with a proof whose outline is similar to Gowers’s; Green and Tao prove the highly noise linearity test result in other prime fields, also following Gowers’s argument.)

In the known proof, which uses difficult results from additive combinatorics, eps’ is exponentially small in eps. Here two major open questions are: (i) find a simple proof and (ii) find a proof where eps’=poly(eps)

More ambitiously, there could be a simple proof that (say, in the Boolean case) if f has noticeable Gowers uniformity, then f is noticeably correlated with a linear function when restricted to a large subspace. The experts in the area would probably be able to turn a proof in the Boolean case to a proof that applies to the setting where we have ZN instead of {0,1}n, Bohr sets instead of subspaces, and things I do not understand instead of polynomials. Even so, if the proof in Boolean setting were simple enough, the whole thing could be significantly simpler than Gowers’s proof, and there could be room for quantitative improvements.

Gowers uniformity

One of the three questions that were more frequently asked to me at STOC was why I did not define Gowers uniformity in my talk on this joint work with Alex Samorodnitsky. Is it because the definition is very complicated? Not at all, it is just that the definition seems very arbitrary unless one has the time to tell some stories about it and to look at a few equivalent versions of the definition and some of its basic properties.

My favorite source is this paper by Green and Tao; the first 6-7 sections are a wonderful review of what is known. Another great source is this survey paper by Green, that presents several open questions that could be of interest to us, especially the question of how large can a subset of F3n be so that no three points are on a line. (The big question is whether the asnwer is (3-o(1))n or not.) There is also a question related to “linearity testing,” and I will probably talk about it in another post. Green has also written a more recent survey paper that is more limited in scope but that has more technical details.

In this post, I will tell a story about Gowers uniformity in the computer science language of linearity testing and low degree test for Boolean functions, which may be the gentlest introduction for many readers. I will almost exclusively refer to Boolean functions, but the theory applies more generally to bounded complex-valued functions whose domain is an Abelian group. (Some definitions/results, however, are more simply stated in the special case of Boolean functions.)

Umesh often complains that, during technical talks, the suspense kills him, meaning that the speaker goes on and on with technical definitions and results, without giving any indication of where he or she is going.

To spoil all suspense: the eventual goal is to have a “higher degree” Fourier analysis that allows us to talk about correlations of a given function with low-degree polynomials. (In contrast with Fourier analysis that, in the Boolean case, talks about correlations of a given function with linear functions.) For every fixed k, we will define a “dimension-k” norm for Boolean functions; such norms are conjectured to “approximate” the correlation of the function with the closest degree-(k-1) polynomial. The conjecture is a theorem for k=2 (by Fourier analysis) and for k=3 (by a tour de force), and it is open otherwise. What is known, however, is that functions with small norm have several useful “pseudorandomness” properties, and that functions with noticable large norm have useful “structure.” (Low-degree-polynomial-like structure.) Most proofs that make use of Gowers uniformity have a two-part structure: if the norm is small, we get what we want from the pseudorandomness; if the norm is large, we get what we want from the structure. In our work on PCP, for example, when we analyse the soundness, we can argue that if the PCP proof is “pseudorandom” then the verifier is about as likely to accept it as it is to accept a random proof (hence, very low probability), but a PCP proof with “structure” can be “decoded” to a valid witness (hence, this cannot happen in the soundness case).

Let’s get started with the definition. If

f:{0,1}n -> {0,1}

is a Boolean function, and y is an element of {0,1}n, let us define

fy (x) := f(x) – f(x+y)

where all operations are mod 2 (or bit-wise mod 2). We think of fy as a sort of “derivative” of f. In particular, if f is a constant then fy is identically 0 for every y, and if f is a degree-d polynomial then fy is a degree-(d-1) polynomial.

We also define fy,z as (fy)z, and so on.

Now, consider the following test of “non-constancy:” pick at random x,y, and check whether fy(x)=0. If f is a constant, then the test accepts with probability 1; it is also possible to argue that the test accepts with probability that is always at least 1/2, and that it accepts with probability 1/2+eps if and only if f has agreement 1/2 + eps’ with a constant function.

This may not be too exciting, but consider the next “logical” step, that is, pick x,y,z at random, and check if fy,z(x)=0. If f is a degree-1 polynomial (an affine function), then the test accepts with probability 1. Also, it can be shown that the test accepts with probability at least 1/2. The interesting thing, which is easy to prove using Fourier analysis, is that the test accepts with probability noticeably larger than 1/2 if and only if f has agreement noticeably larger than 1/2 with a degree-1 function. Indeed, the test checks whether

f(x) – f(x+y) -f(x+z) + f(x+y+z)=0

and, up to a change of variables, this is the same as checking if

f(x) + f(y) +f(z) = f(x+y+z)

a slight variant of the Blum-Luby-Rubinfeld linearity test.

So you see where this is going: pick x,x1,…,xk at random and check if fx1,…,xk(x) =0. If f is a degree-(k-1) polynomial then the test accepts with probability 1.

Alon, Kaufman, Krivelevich, Litsyn and Ron proved in Random’03 that if the test accepts with probability 1-eps then f has agreement 1-O(2k eps) with a degree-(k-1) polynomial. Samorodnitsky, in a manuscript that predates our PCP work, proves the more challenging “low-end” result that, for k=3, if the test accepts with probability 1/2 + eps then the function has agreement 1/2 + eps’ with a degree-2 polynomial.

The low-end result is open for k=4! Furthermore, for k=3, eps’ is exponentially small in eps. It is an open question to have a polynomial dependency.

So, what about these Gowers uniformity norms? Take the probability that the test with x1…xk accepts, subtract 1/2 from the probability, and multiply by 2, so you get a number between 0 and 1. This is the dimension-k Gowers uniformity of f; Alex and I use the notation Uk(f). The 2kth root of the number I just described is the dimension-k Gowers uniformity norm of f, denoted ||f|| with subscript Uk (sorry, HTML does not allow subscripts of subscripts).

What’s with this subtracting 1/2 and multiplying by 2 business? The analytical study of Boolean functions is often cleaner if we think of them as functions

f: {0,1}n -> {-1,1}

where bit b is replaced by bit (-1)b in the output. With this notation, we can define

fy (x) := f(x) * f(x+y)

and now the dimension-k uniformity can be given the following equivalent, but much cleaner, definition:

Uk (f) := Ex,x1,…,xk fx1…xk (x)

Now the advantage is that the definition can be applied to any real-valued function (which is often useful), not just Boolean functions. The 2kth root of Uk (f) is a norm for the space of real-valued functions of domain {0,1}n. The definitions extends to functions f: G-> R where G is any Abelian group (we just need to re-interpret the + in the definition) and R are the real numbers. The definition can also be extended to complex-valued functions, with a small change to the definition of fy.

With this notation, Alex proves that, for Boolean f, U3(f) is noticeably large if and only if f has noticeable correlation with a degree-2 polynomial. Green and Tao, in their 72-page paper I referenced above, prove an analogous result for bounded functions f:G->C where G is an Abelian group of order not divisible by 2 or by 3, and C are the complex number. (If the group contains a large cyclic subgroup, then it is quite messy to even state the theorem. If G is of the form Fnq where n is large compared with q, then the statement is simply about noticeable norm versus correlation with polynomials.)

And, to summarize again the hundreds of pages of mathematical knowledge we have about these norms, if the norm is small, the function is kind of pseudorandom; if the norm is not small, then the function is kind of “structured,” in a way that sort of resembles a low-degree polynomial. It is open to turn “sort of resembles” into “is correlated with.” (And it cannot be done if the domain of the function is a group with a large cyclic subgroup — in that case, there are more complicated conjectures.)