# The Next Viral Videos

Back in August, Boaz Barak and Moses Charikar organized a two-day course on additive combinatorics for computer scientists in Princeton. Boaz and Avi Wigderson spoke on sum-product theorems and their applications, and I spoke on techniques in the proofs of Szemeredi’s theorem and their applications. As an Australian model might say, that’s interesting!

Videos of the talks are now online. The quality of the audio and video is quite good, you’ll have to decide for yourself on the quality of the lectures. The schedule of the event was grueling, and in my last two lectures (on Gowers uniformity and applications) I am not very lucid. In earlier lectures, however, I am merely sleep deprived — I can be seen falling asleep in front of the board a few times. Boaz’s and Avi’s lectures, however, are flawless.

# Best Tutorials Ever

FOCS 2007 started yesterday in Providence with a series of tutorials.

Terry Tao gave a talk similar to the one he gave in Madrid, discussing the duality between pseudorandomness and efficiency which is a way to give a unified view of techniques coming from analysis, combinatorics and ergodic theory.

In typical such results, one has a set $F$ of “simple” functions (for example linear, or low-degree polynomials, or, in conceivable complexity-theoretic applications, functions of low circuit complexity) and one wants to write an arbitrary function $g$ as

$g(x) = g_{pr} (x) + g_{str} (x) + g_{err} (x)$

where $g_{pr}$ is pseudorandom with respect to the “distinguishers” in $F$, $g_{str}$ is a “simple combination” of functions from $\cal F$, and $g_{err}$ accounts for a possible small approximation error. There are a number of ways to instantiate this general template, as can be seen on the accompanying notes, and it is nice to see how even the Szemeredi regularity lemma can be fit into this template. (The “functions” are adjacency matrices of graphs, and the “efficient” functions are complete bipartite subgraphs.)

Dan Boneh spoke on pairing-based cryptography, an idea that has grown into a whole, rich, area, with specialized conferences and, according to Google Scholar, 1,200+ papers published so far. In this setting one has a group $G$ (for example points on an elliptic curve) such that there is a mapping $e: G X G \rightarrow G_T$ that takes pairs of elements of $G$ into an element of another group $G_T$ satisfying a bilinearity condition. (Such a mapping is a “pairing,” hence the name of the area.) Although such mapping can lead to attacks on the discrete log problem in $G$, if the mapping is chosen carefully one may still assume intractability of discrete log in $G$, and the pairing can be very useful in constructing cryptographic protocols and proving their security. In particular, one can get “identity-based encryption,” a type of public key cryptography where a user’s public key can be her own name (or email address, or any deterministically chosen name), which in turn can be used as a primitive in other applications.

Dan Spielman spoke on spectral graph theory, focusing on results and problems that aren’t quite studied enough by theoreticians. He showed some remarkable of example of graph drawings obtained by simply plotting a vertex $i$ to the point $(v(i),w(i))$, where $v$ and $w$ are the second largest and third largest eigenvalues of the laplacian of the adjacency matrix. The sparse cut promised by Cheeger inequality is, in such a drawing, just the cut given by a vertical line across the drawing, and there are nice algebraic explanations for why the drawing looks intuitively “nice” for many graphs but not for all. Spectral partitioning has been very successful for image segmentation problems, but it has some drawbacks and it would be nice to find theoretically justified algorithms that would do better.

Typically, I don’t go to an Italian restaurant in the US unless I have been there before and liked it, a rule that runs into certain circularity problems. I was happy that yesterday I made an exception to go to Al Forno, which proved to be truly exceptional.

# The unreasonable effectiveness of additive combinatorics in computer science

As I have written several times on these pages, techniques from additive combinatorics seem to be very well suited to attack problems in computer science, and already a good amount of applications have been found. For example, “sum-product theorems” originally developed in a combinatorial approach to the Kakeya problem have been extremely valuable in recent constructions of randomness extractors. The central theorem of additive combinatorics, Szemeredi’s theorem, has now four quite different proofs, one based on graph theory and Ramsey theory, one based on analytical methods, one based on ergodic theory and one based on hypergraph theory. The first proof introduced the Szemeredi regularity lemma, which is a fixture of algorithmic work on property testing. The analytical proof of Gowers introduced the notion of Gowers uniformity that, so far, has found application in PCP constructions, communication complexity , and pseudorandomness. There is also work in progress on complexity-theoretic applications of some of the ergodic-theoretic techniques.

Why is it the case that techniques developed to study the presence of arithmetic progressions in certain sets are so useful to study such unrelated notions as sub-linear time algorithms, PCP systems, pseudorandom generators, and multi-party protocols? This remains, in part, a mystery. A unifying theme in the recent advances in additive combinatorics is the notion that every large combinatorial object can be “decomposed” into a “pseudorandom” part and a “small-description” part, and that many questions that we might be interested in are easy to answer, at least approximately, on pseudorandom and on small-description objects. Since computer scientists almost always deal with worst-case scenario, and are typically comfortable with approximations, it is reasonable that we can take advantage of techniques that reduce the analysis of arbitrary worst cases to the analysis of much simpler scenarios.

Whatever the reason for their effectiveness, it is worthwhile for any theoretical computer scientist to learn more about this fascinating area of math. One of the tutorials in FOCS 2007 will be on additive combinatorics, with a celebrity speaker. More modestly, following Random-Approx 2007, in Princeton, there will be a course on additive combinatorics for (and by) computer scientists. (If you want to go, you have to register by August 1 and reserve the hotel by this weekend.)

# Pseudorandomness and more pseudorandomness

The first talk of the morning is by Terry Tao. Tim Gowers, in his essay The Two Cultures of Mathematics explains the difference between depth and generality in combinatorics versus other areas of pure math. In combinatorics, one is unlikely to find deep definitions (have you ever tried to understand the statements of the conjectures in the Langlands program?) and very general theorems. In combinatorics, it is more usual to find generality in the form of principles that the practictioners of the subject are aware of and that are exemplified in the proofs of certain theorems. (I hope I have not distorted and oversimplified Gowers’s point too much.)

Tao’s talk very explicitly points out the principle that emerges from his work on arithmetic progressions. The idea is that one can often define notions of pseudorandomness and structure such that one has a theorem of the following kind:

• Dichotomy: if an object is not pseudorandom, then it has large stuctured component.

For example, in the setting of looking for arithmetic progressions of length 3, a subset of {1,…,N} is “structured” if it has a large Fourier coefficient and it is pseudorandom if it has approximately the same number of length-3 progressions of a random subset of {1,…,N} with the same density.

By iterative arguments, one then has theorems of the form

• Decomposition: every object can be written as a “sum” of a pseudorandom part and a structured part

This is very important in the proof of the Green-Tao theorem on arithmetic progressions in the primes, but it can be seen, implicitly, in the Szemeredi regularity Lemma and in all proofs of Szemeredi’s theorem.

The proof typically goes by: if our object is pseudorandom, we are done; otherwise we find a large structured “piece.” We “subtract” it. If what remains is pseudorandom, we are fine, otherwise we subtract another structured part, and so on.

The advantage of this type of decomposition (whether it is done explicitely or it is implicit in the proof) is that one can use techniques from analysis and probability to analyse the pseudorandom part and (depending on the definition of “structure”) techniques from analysis and geometry to analyse the structured part. This is how the proof of the Green-Tao theorem ends up using analysis, ergodic theory, combinatorics, and so on.

Often, one can use geometry and algebra only provided that the object is “exactly” structured, while results of the above type may only give “approximate” structure. To bridge this gap one employs results of the type:

• Rigidity: an object that is “approximately structured” is “close” to an object that is structured

These are results in the spirit of property testing. (And he mentions property testing in his talk.)

This is all quite abstract, but reading papers in the area (or approaching new problems) with this point of view in mind is very helpful. He also proceeds to summarize the proof of the Green-Tao theorem in terms of the above perspective.

By the way, earlier in the talk, about the general importance of understanding pseudorandomness, Tao talks mentions expander graphs and the P versus BPP question. (As a context, not as specific problems that he is going to work on. For the time being, we are safe.)

Overall, the talk is a model of clarity and insight.

After the break, Avi Wigderson gives his talk on P, NP and Mathematics. Yesterday, Avi told me that, of all the material that he cover in his 50-page proceedings paper, he had decided to talk about pseudorandomness and derandomization, from the perspective that randomness seems to help in computation, but complexity theory suggests that it does not. I despaired, because this is exactly my talk, but luckily Avi’s talk has a somewhate complementary emphasis on topics. (So I won’t have to spend the next two days remaking my slides and I can go see the Prado tomorrow.)

Avi explains the “hardness versus randomness” principle in terms that could not be clearer, and one can see heads nodding as he goes along (nodding in agreement, not nodding off). He ends up with the following question about the complexity of the permanent: is there a linear transformation L of nXn matrices into kXk matrices such that, (i) for every matrix M, the determinant of L(M) equals the permanent of M and (ii) k is only polynomial in n? (The answer is supposed to be NO.) From known results one can derive that k need not be more than exp(O(n)) and a Omega(n2) lower bound is also known.

This is to say that one can work on complexity theory without having to learn about such things as algorithms, Turing machines and so on, and that purely algebraic questions are completely open and very related to complexity questions.

# Property testing and Szemeredi’s Theorem

After discussing Szemeredi’s Theorem and the analytical approaches of Roth and Gowers, let’s see some ideas and open problems in the combinatorial approach.

The following result is a good starting point.

Triangle Removal Lemma. For every $\delta >0$ there is a constant $\epsilon(\delta) > 0$ such that if $G$ is an $n$-vertex graph with at most $\epsilon(\delta)n^3$ triangles, then it is possible to make $G$ triangle-free by removing at most $\delta n^2$ edges.

This result follows easily from the Szemeredi Regularity Lemma, and it is a prototype of several results in the field of property testing.

Indeed, consider the problem of distinguishing triangle-free graphs from graphs that are not even close to being triangle-free. For the purpose of this example, we think of a graph as “close to being triangle-free” if it can be made triangle-free by removing at most $\delta n^2$ edges, for some small constant $\delta > 0$. Then the Lemma tells us that in a not-even-close graph we have at least $\epsilon(\delta) n^3$ triangles, and so a sample of $O(1/\epsilon(\delta))$ vertices is likely to contain a triangle. So here is an algorithm for the problem: pick at random $O(1/\epsilon(\delta))$ vertices and see if they induce a triangle. We note that:

• The algorithm satisfies our requirement;
• The running time is a constant independent of $n$ and dependent only on $\delta$;
• this all makes sense only if $1/\epsilon(\delta)$ grows moderately as a function of $1/\delta$.

Let us now see that the Triangle Removal Lemma proves Szemeredi’s Theorem for sequences of length 3. As we discussed earlier, it is enough to prove the Theorem in groups of the type ${\mathbf Z}_N$, with $N$ prime; we will do better and prove the Theorem in any abelian group. So let $G$ be an abelian group of size $N$, an let $A$ be a subset of $G$ of size $\delta N$. Construct a tri-partite graph $(X,Y,Z,E)$ where $X,Y,Z$ are sets of vertices, each of size $N$, and each a copy of the group $G$. The set of edges is defined as follows:

1. for $x \in X$ and $y \in Y$, $(x,y)$ is an edge if there is an $a \in A$ such that $x+a=y$
2. for $x \in X$ and $z \in Z$, $(x,z)$ is an edge if there is a $b\in A$ such that $x + b + b = z$
3. for $y \in Y$ and $z \in Z$, $(y,z)$ is an edge if there is a $c \in A$ such that $y+c=z$

Now we notice that if $x,y,z$ is a triangle, then there are $a,b,c$ in $A$ such that (after some cancellations) $a+c = b+b$, which means that $a,b,c$ is an arithmetic progression of length 3 in the group $G$. In fact, we see that the number of triangles in the graph is precisely $N$ times the number of triples $(a,b,c)$ in arithmetic progression in $A$.

Consider now the $N \cdot |A| = \delta N^2$ triangles corresponding to the “trivial” progressions of the form $a,a,a$. (These are the triangles $x,x+a,x+a+a$.) We can see that these triangles are edge-disjoint, and so in order just to remove such triangles we have to remove from the graph at least $\delta N^2$ edges. So the Triangle Removal Lemma implies that there are at least $\epsilon(\delta)N^3$ triangles in the graph, and so at least $\epsilon(\delta)N^2$ triples in arithmetic progression in $A$. If $N > \delta/\epsilon(\delta)$, then some of those arithmetic progressions must be non-trivial, and we have the length-3 case of Szemeredi’s theorem.

We see that both for the “property testing” application and for the Szemeredi Theorem application it is important to have good quantitative bounds on $\epsilon(\delta)$. We know that, in Szemeredi’s theorem, $N$ must be super-polynomial in $1/\delta$, and so, in the Triangle Removal Lemma, $1/\epsilon(\delta)$ must be super-polynomial in $1/\delta$.

The known bounds, however, are quite unreasonable: we only know how to bound $1/\epsilon(\delta)$ by a tower of exponentials whose height is $poly(1/\delta)$. Unfortunately, such unreasonable bounds are unavoidable in any proof of the Triangle Removal Lemma based on Szemeredi’s Regularity Lemma. This is because, as proved by Gowers, such tower-of-exponentials bounds are necessary in the Regularity Lemma.

Can we find a different proof of the Triangle Removal Lemma that has only a singly- or doubly-exponential $\epsilon(\delta)$? That would be wonderful, both for property testing (because the proof would probably apply to other sub-graph homomorphism problems) and for additive combinatorics. Over the last year, I have found at least three such proofs. Unfortunately they were all fatally flawed.

Or, is there a more-than-exponential lower bound for the Triangle Removal Lemma? This would also be a major result: it would show that several results in property testing for dense graphs have no hope of being practical, and that there is a “separation” between the quantitative version of Szemeredi’s Theorem provable with analytical methods versus what can be proved with the above reduction. Besides, it’s not often that we have super-exponential lower bounds for natural problems, with Gowers’s result being a rare exception.

By the way, what about Szemeredi’s Theorem for longer sequences? For sequences of length $k$ one needs a “$k$-clique removal lemma” for $(k-1)$-uniform hypergraphs, which in turn can be derived from the proper generalization of the Regularity Lemma to hypergraphs. This turns out to be quite complicated, and it has been accomplished only very recently in independent work by Nagle, Rödl and Schacht; and by Gowers. An alternative proof has been given by Tao. The interested reader can find more in expository paper by Gowers and Tao.

What about Szemeredi’s own proof? It does use the Regularity Lemma, which was conceived and proved specifically for this application, and it involves a reduction to a graph-theoretic problem. I have to admit that I don’t know much else.

# Polynomials and subspaces

My latest post on Szemeredi’s theorem was long and rambling, mostly because I had no idea what I was talking about. There are, however, a couple of neat questions, and I am afraid they got lost in the rambling, so let me state them again.

1. It is either known or probably derivable from known proof techniques that

If f:{0,1}^n -> {0,1} has agreement 1/2+eps with a degree-k polynomial, then there is an affine subspace V of {0,1}n, of dimension n/c, and a linear function L, such that f and L have agreement 1/2 + eps’ when restricted to V. Here c=c(k) is a constant that depends only on k and eps’=eps'(eps,k) is a constant that depends only on eps and on k.

This is somewhat surprising, even if you look at the special case of a function f that is a polynomial (as opposed to being correlated with a polynomial).

Is there a simple proof of this result? Is the result true if you replace “subspace” by “restriction”? (Meaning, you are only allowed to fix all but a constant fraction of the variables.) Shouldn’t this be useful in complexity theory somewhere?

[Update 7/14/06: the degree-2 case has indeed a very simple proof. A more general result is proved here, in section 7. For higher degree, the result is actually false, in the sense that it is impossible to find a subspace of dimension constant*n. Avi Wigderson and I have found a counterexample already for degree 3, and Ben Green has sharpened our lower bound. The best one can do for degree-k polynomials is to find a subspace of dimension about n(1/(k-1)).]

2. If f has agreement 1/2+eps with a linear function L in a subspace V of dimension n-t, then there is a linear function L’ such that f and L’ agree on a fraction at least 1/2+eps/2t of all of {0,1}n

This is fairly easy to see.

3. This is a fascinating challenge: find an simple proof (or any proof that fits in fewer than, say, 20 pages) that if f:{0,1}n->{-1,1} has dimension-k Gowers uniformity at least eps, then there is a linear subspace V of dimension n/c and a linear function L, such that f and L have agreement at least 1/2+eps’ on V. (Where c=c(k) and eps’=eps'(eps,k).)

Chances are that such a proof could be turned into a simpler proof of Gowers’s quantitative version of Szemeredi’s theorem. (A proof of the above result for boolean functions can probably be “translated back” from Gowers’s paper, and it would probably be at least 50-60 pages. As I understand it, such a proof would first argue that f is correlated with a polynomial in a large subspace, and then proceed with (1) above. I wonder: can one prove such a statement without mentioning polynomials at all?)

4. Unfortunately the conclusion cannot be strenghtened to “… there is a linear subspace V of dimension n-o(n)…”. Consider

f(x1,…,x1):= x1*x2+…+ xn-1*xn mod 2

Its dimension-3 Gowers uniformity is 1, but, if you could find a subspace V of dimension n-o(n) on which f agrees with a linear function on a 1/2+eps’ fraction of inputs, then you could also find a linear function that agrees with f on a 1/2+eps/2o(n) fraction of inputs. But we know that no linear function agrees with f on more than a 1/2+1/2n/2 fraction of inputs.

# Analytical approaches to Szemeredi’s Theorem: general case

After having discussed the notion of Gowers uniformity, the statement of Szemeredi’s theorem, and the proof for progressions of length 3, let me try to say something about the case of progressions of length 4 or more, and the several related open questions.

As already discussed, Gowers’s proof is based on an “algorithm” that, given a subset A of ZN of size dN, does one of the following two things:

1. It immediately finds a progression of length k in A; or
2. it constructs a subset A’ of ZN’ such that: (i) if there is a progression of length k in A’, then there is also such a progression in A; (ii) A’ has size at least (d+poly(d))*N’; N’ is at least a constant root of N.

As in the proof for the case of progressions of length 3, once we have such an algorithm we are able to find a progression of length k in A, provided that, initially, N is at least exp(exp(poly(1/d))).

How does the “algorithm” work? Let us, again, identify the set A with its characteristic function A: ZN -> {0,1}, and consider the expression

(*) Ex,y A(x)*A(x+y)*A(x+2y)*…*A(x+(k-1)*y)

It is possible to show that (*) equals dk (which is what we would expect if A were a “pseudorandom” set) plus or minus an error term that depends on Uk-1(A-d), where Uk-1 is the dimension-(k-1) Gowers uniformity as defined earlier. A result of this kind (relating Gowers uniformity and number of arithmetic progressions) is called a “generalized Von Neumann theorem” in the literature, for reasons that I unfortunately ignore. The proof is not difficult at all, it is just a series of applications of Cauchy-Schwartz. The genius is in the choice of the right definition. (And the technical difficulties are in what comes next.)

This means that if the dimension-(k-1) Gowers uniformity of the function f(x):=A(x)-d is small, then A contains a lot of length-k arithmetic progressions, and we are in case (1)

What remains to prove is that if the set A is such that f(x):=A(x)-d has noticeably large dimension-(k-1) Gowers uniformity then we can find a set A’ in ZN’ as in case (2) above. Gowers’s paper is approximately 130 pages long, and about 100 pages are devoted to proving just that.

Obviously, I have no idea what goes on in those 100 pages, but (based on later work by Green and Tao) I can guess how it would be if we were trying to prove Szemeredi’s theorem not in ZN but in Fnp with p prime and larger than k. Translated to such a setting (I think), Gowers’s argument would proceed by showing that if Uk(A-d) is at least eps, then there is an affince subspace V of Fnp of dimension n/O(1) such that A restricted to V has density at least d+poly(eps). This, in turn, would be proved by showing

1. If f has noticeably large dimension-(k-1) Gowers uniformity, then there is an affine subspace V of dimension n/O(1) and a polynomial phase function g of degree (k-2) such that, restricted to V, f and g are noticeably correlated
2. If f is a function that has noticeable correlation with a low degree polynomial phase function, then there is a subspace of dimension n/O(1) such that f is correlated to a linear function when restricted to that subspace.

Let’s make things simple by restricting ourselves to Boolean function (even though this p=2 case has no direct application to results about arithmetic progressions).

Then, part (1) is saying that if f:{0,1}n->{-1,1} has noticeably large dimension-k Gowers uniformity, then there is a polynomial p() of degree (k-1) over Z2, and an affine subspace V of dimesion n/O(1), such that f(x) and (-1)p(x) agree on noticeably more than 1/2 of the inputs x. Part (2) is saying that if f:{0,1}n->{-1,1} and (-1)p(x) agree on noticeably more than 1/2 of the inputs, where p is a low-degree polynomial, then there is an affine subspace V of dimension n/O(1) such that, restricted to V, f has agreement with a linear function on noticeably more than 1/2 of the inputs.

The two results together imply that if f has noticeably large Gowers uniformity then there is a large sub-space in which f is correlated with a linear function. (To reach the conclusion, you need to apply part (1), then do a change of variables so that now the V of part (1) looks like {0,1}n/O(1), and then apply part(2).)

Actually, what I just wrote may be an open question, but it should be provable along the lines of Gowers’s proof, and it should be much easier to prove.

For the case of progressions of length 4 in Fnp, for small prime p>4, Green and Tao prove that if f is a bounded function of noticeably large dimension-3 Gowers uniformity, then f is correlated to a degree-2 polynomial over all of Fnp, and the correlation is especially good on a subspace of dimension n-O(1). (As opposed to n/O(1).) This improvement, and several additional ideas, lead to an improved quantitative version of Szemeredi’s theorem for progressions of length 4 in Fnp, and they also announced a similar improved result for progression of length 4 over the integers.

Their proof relies on the analysis of a certain “linearity test in the highly noisy case,” which is also used (and, I think, proved for the first time) in Gowers’s paper. In the, simpler, Boolean, case, the result is

Let F:{0,1}n -> {0,1}m, and suppose that

Prx,y[F(x)+F(y) = F(x+y)] > eps

where all operations are bitwise mod 2. Then there is an eps’ (depending only on eps, but independent of n,m) and a matrix M such that the functions x->F(x) and x->Mx have agreement at least eps’

(Update: as Alex explains in his comment below, the above sentence is quite misleading. Gowers proves quite a different statement in ZN, a setting in which an analogous “highly noise linearity test” statement is provably false, and where it is difficult to even get the statement of the right analog. The Boolean statement above is proved for the first time in Alex’s paper, with a proof whose outline is similar to Gowers’s; Green and Tao prove the highly noise linearity test result in other prime fields, also following Gowers’s argument.)

In the known proof, which uses difficult results from additive combinatorics, eps’ is exponentially small in eps. Here two major open questions are: (i) find a simple proof and (ii) find a proof where eps’=poly(eps)

More ambitiously, there could be a simple proof that (say, in the Boolean case) if f has noticeable Gowers uniformity, then f is noticeably correlated with a linear function when restricted to a large subspace. The experts in the area would probably be able to turn a proof in the Boolean case to a proof that applies to the setting where we have ZN instead of {0,1}n, Bohr sets instead of subspaces, and things I do not understand instead of polynomials. Even so, if the proof in Boolean setting were simple enough, the whole thing could be significantly simpler than Gowers’s proof, and there could be room for quantitative improvements.

# Analytical approaches to Szemeredi’s Theorem: k=3

Here is the idea of Roth’s proof of the k=3 case of Szemeredi’s Theorem. (See yesterday’s post if you have no idea what I am talking about.) We have a subset A of ZN of size dN and we want to find a length-3 arithmetic progression in A. A first observation is that if d > 2/3 then we are done, because if we pick a,b at random there is positive probability that a,a+b,a+b+b are all in A. (Use the union bound.) The proof will work by “induction” on d, showing that the truth of the theorem for a smaller value of d can be derived from the truth of the theorem for a larger value d. Of course d is a continous parameter, so it will not really be induction, but that’s how we should think of it.

The main result is an “algorithm” that given A subset of ZN of size dN does one of the following two things:

1. It immediately finds a length-3 progression in A; or
2. it constructs a subset A’ of ZN’ such that (i) if A’ has a length-3 progression then so does A; (ii) A’ has size at least about (d+d2)*N’; and (iii) N’ is about sqrt(N).

So, given A, we run the algorithm, and we either immediately find the progression, or we get A’. In the latter case, we run the algorithm on A’, and we either get a progression in A’ (implying a progression in A) or we get a new set A”, and so on. But how many times do we have to keep doing this? At each step, the density of the set increases, and if it ever goes above 2/3 then we are also done. So, how many times can you do the d -> d + d2 transformation until you get 2/3? Certainly no more than O(1/d2) times, but actually O(1/d) is enough. This means that we will always find a length-3 progression in A, provided that N is large enough that we can take its square root O(1/d) times. So if N is doubly exponential in 1/d we have our proof.

Gowers’s proof for the general case has the same outline. Now we want to find a length-k progression in A. If the density of A is more than (k-1)/k, we are done. Gowers describes an algorithm that, given A, either finds a length-k progression or constructs a set A’ in ZN’ such that if A’ has a length-k progression that so does A’. Furthermore, A’ has density d+poly(d) and N’ is a constant root of N.

How do these arguments work? Let us look at Roth’s proof in the case in which A is a subset of Fnp with prime p; for concreteness, take p=3. Given A, consider its characteristic function (that we denote again A) A: Fn3 -> {0,1} and compute its Fourier transform.

Consider now the probability that, when picking a,b at random, the three points a,a+b,a+b+b are all in A. This is the same as

(*) Ea,b A(a)*A(a+b)*A(a+b+b)

which can be expressed really cleanly in terms of the Fourier coefficients of A. In particular, the expression (*) is d3, which is the number you would get if you were looking at 3 independent points, plus or minus an error term that depends on the largest Fourier coefficient of A. If all Fourier coefficients are less than, say, d2/2, then epression (*) is at least d3/2, and we have plenty of length-3 progressions in A; this is case (1) of the proof. In this case, we should think of A as a “pseudorandom set against adversaries that look for random length-3 progressions.”

What happens if A has a large Fourier coefficient? This means that A is correlated with a linear function, and so there is an affine subspace V in Fn3 of dimension n-1 such that A is denser in V than in the rest of Fn3. Now, map V to Fn-13 via an invertible affine map, and let A’ be the image of A under this map. If there is a length-3 progression in A’, that’s just 3 points on a line; but then it means that also in A we had 3 points on a line. The set A’ has density about d+d2 in Fn-13, and so we have part (2) of the argument.

As a bonus, we only lost a constant fraction of our group size at each step, so d can be as large as about 1/logN, where N=3n is the size of the initial group containing A.

Note the “win-win” structure of the argument. If A has small Fourier coefficients, then it is “pseudorandom”, and we get what we want from the pseudorandomness. Otherwise, A has some “linear structure,” and we can take advantage of it to reduce to a simpler instance of our original problem.

This “either we get what we want or we reduce to a simpler case” proof strategy has been used in some constructions of extractors, but it seems such a good idea that it may apply more widely, and one should keep it mind.

What about lower bounds? Excellent question. There is a very simple construction due to Behrend of a subset of size N/exp(sqrt(log N)) of {1,…,N} that has no length-3 progression, so N must be super-polynomial in 1/d when we prove the theorem over the integers (or over ZN with prime N, the two cases are essentially equivalent). Note that if you try to find a large set with no length-3 progression using the probabilistic method, you will not go very far. This is a better than random explicit construction, a rarity in extremal combinatorics. By the way, Behrend’s construction is very simple (here is a half-page description), it is 60 year old, and nobody has been able to improve it ever since.

What about lower bounds for Fn3? There is no known construction of a large subset avoiding length-3 progressions. The best lower bounds are of the form cn, with c<3. Is there a (3-o(1))n lower bound? This is a major open question. If there were a (2.999)n upper bound, then it would be a stunning result, because it would show that the business of translating proofs from finite fields to the integers using Bohr sets cannot be applied as a black box, but it works on some proofs and does not work on other proofs.

# Szemeredi’s theorem

Szemeredi’s theorem on arithmetic progressions is one of the great triumphs of the “Hungarian approach” to mathematics: pose very difficult problems, and let deep results, connections between different areas of math, and applications, come out as byproducts of the search for a solution.

(Tim Gowers’s essay The two cultures of mathematics discusses brilliantly this “problem solving” way of doing math, as distinct from the “theory building” way.)

Some of the math surrounding Szemeredi’s theorem has found applications in property testing, in PCP constructions, and in extractor constructions, and more future applications are likely. Besides, it is truly beautiful (and somewhat accessible) math. It can be a lot of fun for a computer scientist to delve into this literature.

In telling the story of Szemeredi’s theorem, one usually starts from Van der Waerden’s 1927 theorem that, no matter how you color the integers with a finite number of colors, there are arbitrarily long monochromatic arithmetic progressions. (An aritmetic progression is a sequence of the form a,a+b,a+2b,…,a+kb,….) In a more quantitative form, the theorem says that for every constants c,k there is an N(c,k) such that for every N>N(c,k), no matter how we color the integers {1,…,N} with c colors, there will be a monochromatic arithmetic progression of length k.

In 1936, Erdos and Turan conjectured that the coloring in Van der Waerden’s theorem was just a “distraction,” and that the “true reason” why the theorem is true is that every dense enough set of integers (in particular, the largest color class in Van der Waerden’s theorem) must contain long arithmetic progressions. Specifically, they conjectured that for every density 0 < d <1 and every integer k there is an N(d,k) such that every subset A of {1,…,N} of cardinality dN contains a length-k arithmetic progression, provided N> N(d,k).

This conjecture became one of the great unsolved problems in Ramsey theory. In 1953, Roth proved the k=3 case of the Erdos-Turan conjecture. In Roth’s proof, N(d,3) is doubly exponential in 1/d. In other words, Roth proves that a subset of {1,…,N} containing at least about N/log log N elements must contain a length-3 progression. The proof is analytical, and uses Fourier analysis. (Or the “circle method” as number theorists say.)

It was only in 1974 that Szemeredi proved the Erdos-Turan conjecture in the general case, and this is the result that is now known as Szemeredi’s Theorem. Szemeredi’s proof is combinatorial, and works via a reduction to a graph-theoretic problem. The Szemeredi regularity Lemma, that has several applications in computer science, is part of the proof. The value of even N(d,4) is a tower of exponentials in Szemeredi’s proof; one only gets that a subset of {1,…,N} of size about N/log* log* n must contain a length-4 progression.

In 1977 Furstemberg found a completely different proof. He proved a “transfer theorem” showing that results about arithmetic progressions (and other related structures) can be derived from statements about certain continous objects, and then he proved the continous statements by “standard methods” in Ergodic Theory. The transfer theorem uses a result that requires the Axiom of Choice and so, while the proof establishes that finite values of N(d,k) exist, it is impossible, even in principle, to extract any quantitative bound.

Quantitative bounds are important because of the long-standing (and recently solved) question of whether the primes contain arbitrarily long arithmetic progressions. If one could show that a subset of {1,…,N} of size about N/log N contains long arithmetic progressions, then the result for the primes follows purely from the fact that the set of primes is dense enough.

There is a simple reduction that shows that, to prove Szemeredi’s theorem over the integers, it is enough to prove it for the additive group ZN, with N prime. (That is, it is enough to look for arithmetic progressions mod N in {1,…,N} .) Also, if one looks at Roth’s proof, it can be seen that it gives a much stronger bound in the additive group Fn3; in that setting, the proof shows that in every subset A of size about 3n/n there are three points in arithmetic progressions (that is, three points of the form a,a+b,a+b+b). The bound is of the form N/log N, where N is the size of the group. The proof, however, uses the fact that Fn3 is a linear space, that it has subspaces, and so on. Bourgain (1999) was able to “simulate” this proof in ZN by using the notion of “Bohr” set of ZN, and showing that it can play a role analogous to that of “subspace” in Fn3. Bourgain obtains a bound that is about N/sqrt(log N).

The great quantitative breakthrough came with the work of Gowers (1998-2001), who proved bounds of the form N/loglog N (as in Roth’s proof) for progressions of any length. His work introduced the notion of Gowers uniformity I discussed two days ago.

So there are now three completely different proofs, Szemeredi’s proof using graph theoretic arguments, Furstemberg’s proof using Ergodic Theory, and Gowers’s analytical proof. Recent work by Tao, however, shows that there are deep similarities between the proofs, although a complete “understanding” of what goes on is probably still to come.

There is also a new (2004-05) fourth proof, that uses a hypergraph regularity lemma, and that is similar in spirit to (but technically very different from, and simpler than) Szemeredi’s proof. It also has deep relations with the the other three proofs. (See e.g. this paper.) It is an independent achievement of Nagle, Rödl, Schacht, and Skokan; Gowers; and Tao.

The most famous result on arithmetic progressions is probably the Green-Tao proof that the primes contain arbitrarily long arithmetic progressions.

The structure of the proof is quite amazing, and perhaps it contains ideas that could also be useful in computer science. They begin by defining a notion of pseudorandomness for sets of integers (which requires two conditions: Gowers uniformity and an additional properties). Then they show:

1. Assuming Szemeredi’s theorem as a black-box, the theorem is also true for subsets of pseudorandom sets. That is, if S is a large enough and pseudorandom enough set of integers, and A is a subset of A of constant density, then A must contain long arithmetic progressions. (Roughly speaking, a “distinguisher” that looks for arithmetic progressions cannot distinguish a dense subset of a pseudorandom set from a dense subset of all the integers. This is akin to the notion of “pseudo-entropy” in HILL, but the quantification is different.)
2. The set of “pseudoprimes” (integers having only few, large, divisors) is pseudorandom in the integers. (This uses new results of Goldston and Yildirim.)
3. The primes have constant density in the pseudoprimes.

Coming up next: Roth’s proof for k=3, the structure of Gowers’s proof for general k, the Green-Tao improvement for k=4, and the constellation of wonderful open questions around the k=3 and k=4 cases in finite vector spaces. (After that, back to politics, food, movies, and China.)

# Gowers uniformity

One of the three questions that were more frequently asked to me at STOC was why I did not define Gowers uniformity in my talk on this joint work with Alex Samorodnitsky. Is it because the definition is very complicated? Not at all, it is just that the definition seems very arbitrary unless one has the time to tell some stories about it and to look at a few equivalent versions of the definition and some of its basic properties.

My favorite source is this paper by Green and Tao; the first 6-7 sections are a wonderful review of what is known. Another great source is this survey paper by Green, that presents several open questions that could be of interest to us, especially the question of how large can a subset of F3n be so that no three points are on a line. (The big question is whether the asnwer is (3-o(1))n or not.) There is also a question related to “linearity testing,” and I will probably talk about it in another post. Green has also written a more recent survey paper that is more limited in scope but that has more technical details.

In this post, I will tell a story about Gowers uniformity in the computer science language of linearity testing and low degree test for Boolean functions, which may be the gentlest introduction for many readers. I will almost exclusively refer to Boolean functions, but the theory applies more generally to bounded complex-valued functions whose domain is an Abelian group. (Some definitions/results, however, are more simply stated in the special case of Boolean functions.)

Umesh often complains that, during technical talks, the suspense kills him, meaning that the speaker goes on and on with technical definitions and results, without giving any indication of where he or she is going.

To spoil all suspense: the eventual goal is to have a “higher degree” Fourier analysis that allows us to talk about correlations of a given function with low-degree polynomials. (In contrast with Fourier analysis that, in the Boolean case, talks about correlations of a given function with linear functions.) For every fixed k, we will define a “dimension-k” norm for Boolean functions; such norms are conjectured to “approximate” the correlation of the function with the closest degree-(k-1) polynomial. The conjecture is a theorem for k=2 (by Fourier analysis) and for k=3 (by a tour de force), and it is open otherwise. What is known, however, is that functions with small norm have several useful “pseudorandomness” properties, and that functions with noticable large norm have useful “structure.” (Low-degree-polynomial-like structure.) Most proofs that make use of Gowers uniformity have a two-part structure: if the norm is small, we get what we want from the pseudorandomness; if the norm is large, we get what we want from the structure. In our work on PCP, for example, when we analyse the soundness, we can argue that if the PCP proof is “pseudorandom” then the verifier is about as likely to accept it as it is to accept a random proof (hence, very low probability), but a PCP proof with “structure” can be “decoded” to a valid witness (hence, this cannot happen in the soundness case).

Let’s get started with the definition. If

f:{0,1}n -> {0,1}

is a Boolean function, and y is an element of {0,1}n, let us define

fy (x) := f(x) – f(x+y)

where all operations are mod 2 (or bit-wise mod 2). We think of fy as a sort of “derivative” of f. In particular, if f is a constant then fy is identically 0 for every y, and if f is a degree-d polynomial then fy is a degree-(d-1) polynomial.

We also define fy,z as (fy)z, and so on.

Now, consider the following test of “non-constancy:” pick at random x,y, and check whether fy(x)=0. If f is a constant, then the test accepts with probability 1; it is also possible to argue that the test accepts with probability that is always at least 1/2, and that it accepts with probability 1/2+eps if and only if f has agreement 1/2 + eps’ with a constant function.

This may not be too exciting, but consider the next “logical” step, that is, pick x,y,z at random, and check if fy,z(x)=0. If f is a degree-1 polynomial (an affine function), then the test accepts with probability 1. Also, it can be shown that the test accepts with probability at least 1/2. The interesting thing, which is easy to prove using Fourier analysis, is that the test accepts with probability noticeably larger than 1/2 if and only if f has agreement noticeably larger than 1/2 with a degree-1 function. Indeed, the test checks whether

f(x) – f(x+y) -f(x+z) + f(x+y+z)=0

and, up to a change of variables, this is the same as checking if

f(x) + f(y) +f(z) = f(x+y+z)

a slight variant of the Blum-Luby-Rubinfeld linearity test.

So you see where this is going: pick x,x1,…,xk at random and check if fx1,…,xk(x) =0. If f is a degree-(k-1) polynomial then the test accepts with probability 1.

Alon, Kaufman, Krivelevich, Litsyn and Ron proved in Random’03 that if the test accepts with probability 1-eps then f has agreement 1-O(2k eps) with a degree-(k-1) polynomial. Samorodnitsky, in a manuscript that predates our PCP work, proves the more challenging “low-end” result that, for k=3, if the test accepts with probability 1/2 + eps then the function has agreement 1/2 + eps’ with a degree-2 polynomial.

The low-end result is open for k=4! Furthermore, for k=3, eps’ is exponentially small in eps. It is an open question to have a polynomial dependency.

So, what about these Gowers uniformity norms? Take the probability that the test with x1…xk accepts, subtract 1/2 from the probability, and multiply by 2, so you get a number between 0 and 1. This is the dimension-k Gowers uniformity of f; Alex and I use the notation Uk(f). The 2kth root of the number I just described is the dimension-k Gowers uniformity norm of f, denoted ||f|| with subscript Uk (sorry, HTML does not allow subscripts of subscripts).

What’s with this subtracting 1/2 and multiplying by 2 business? The analytical study of Boolean functions is often cleaner if we think of them as functions

f: {0,1}n -> {-1,1}

where bit b is replaced by bit (-1)b in the output. With this notation, we can define

fy (x) := f(x) * f(x+y)

and now the dimension-k uniformity can be given the following equivalent, but much cleaner, definition:

Uk (f) := Ex,x1,…,xk fx1…xk (x)

Now the advantage is that the definition can be applied to any real-valued function (which is often useful), not just Boolean functions. The 2kth root of Uk (f) is a norm for the space of real-valued functions of domain {0,1}n. The definitions extends to functions f: G-> R where G is any Abelian group (we just need to re-interpret the + in the definition) and R are the real numbers. The definition can also be extended to complex-valued functions, with a small change to the definition of fy.

With this notation, Alex proves that, for Boolean f, U3(f) is noticeably large if and only if f has noticeable correlation with a degree-2 polynomial. Green and Tao, in their 72-page paper I referenced above, prove an analogous result for bounded functions f:G->C where G is an Abelian group of order not divisible by 2 or by 3, and C are the complex number. (If the group contains a large cyclic subgroup, then it is quite messy to even state the theorem. If G is of the form Fnq where n is large compared with q, then the statement is simply about noticeable norm versus correlation with polynomials.)

And, to summarize again the hundreds of pages of mathematical knowledge we have about these norms, if the norm is small, the function is kind of pseudorandom; if the norm is not small, then the function is kind of “structured,” in a way that sort of resembles a low-degree polynomial. It is open to turn “sort of resembles” into “is correlated with.” (And it cannot be done if the domain of the function is a group with a large cyclic subgroup — in that case, there are more complicated conjectures.)