Here is the idea of Roth’s proof of the k=3 case of Szemeredi’s Theorem. (See yesterday’s post if you have no idea what I am talking about.) We have a subset *A* of Z_{N} of size *dN* and we want to find a length-3 arithmetic progression in *A*. A first observation is that if *d > 2/3* then we are done, because if we pick *a,b* at random there is positive probability that *a,a+b,a+b+b* are all in *A*. (Use the union bound.) The proof will work by “induction” on *d*, showing that the truth of the theorem for a smaller value of *d* can be derived from the truth of the theorem for a larger value *d*. Of course *d* is a continous parameter, so it will not really be induction, but that’s how we should think of it.

The main result is an “algorithm” that given *A* subset of Z_{N} of size *dN* does one of the following two things:

- It immediately finds a length-3 progression in
*A*; or - it constructs a subset
*A’*of Z_{N’}such that (i) if*A’*has a length-3 progression then so does*A*; (ii)*A’*has size at least about*(d+d*; and (iii)^{2})*N’*N’*is about sqrt(*N*).

So, given *A*, we run the algorithm, and we either immediately find the progression, or we get *A’*. In the latter case, we run the algorithm on *A’*, and we either get a progression in *A’* (implying a progression in *A*) or we get a new set *A”*, and so on. But how many times do we have to keep doing this? At each step, the density of the set increases, and if it ever goes above 2/3 then we are also done. So, how many times can you do the * d -> d + d ^{2}* transformation until you get 2/3? Certainly no more than

*O(1/d*times, but actually

^{2})*O(1/d)*is enough. This means that we will always find a length-3 progression in

*A*, provided that N is large enough that we can take its square root

*O(1/d)*times. So if N is doubly exponential in

*1/d*we have our proof.

Gowers’s proof for the general case has the same outline. Now we want to find a length-k progression in *A*. If the density of *A* is more than (k-1)/k, we are done. Gowers describes an algorithm that, given *A*, either finds a length-k progression or constructs a set *A’* in Z_{N’} such that if *A’* has a length-k progression that so does *A’*. Furthermore, *A’* has density *d+poly(d)* and *N’* is a constant root of *N*.

How do these arguments work? Let us look at Roth’s proof in the case in which *A* is a subset of *F ^{n}_{p}* with prime

*p*; for concreteness, take

*p=3*. Given

*A*, consider its characteristic function (that we denote again

*A*)

*A: F*and compute its Fourier transform.

^{n}_{3}-> {0,1}Consider now the probability that, when picking *a,b* at random, the three points *a,a+b,a+b+b* are all in *A*. This is the same as

(*)

E_{a,b}A(a)*A(a+b)*A(a+b+b)

which can be expressed really cleanly in terms of the Fourier coefficients of *A*. In particular, the expression (*) is *d ^{3}*, which is the number you would get if you were looking at 3 independent points, plus or minus an error term that depends on the largest Fourier coefficient of

*A*. If all Fourier coefficients are less than, say,

*d*, then epression (*) is at least

^{2}/2*d*, and we have plenty of length-3 progressions in

^{3}/2*A*; this is case (1) of the proof. In this case, we should think of

*A*as a “pseudorandom set against adversaries that look for random length-3 progressions.”

What happens if *A* has a large Fourier coefficient? This means that *A* is correlated with a linear function, and so there is an affine subspace *V* in F* ^{n}_{3}* of dimension

*n-1*such that

*A*is denser in

*V*than in the rest of F

*. Now, map*

^{n}_{3}*V*to F

*via an invertible affine map, and let*

^{n-1}_{3}*A’*be the image of

*A*under this map. If there is a length-3 progression in

*A’*, that’s just 3 points on a line; but then it means that also in

*A*we had 3 points on a line. The set

*A’*has density about

*d+d*in F

^{2}*, and so we have part (2) of the argument.*

^{n-1}_{3}As a bonus, we only lost a constant fraction of our group size at each step, so *d* can be as large as about 1/logN, where N=3^{n} is the size of the initial group containing A.

Note the “win-win” structure of the argument. If *A* has small Fourier coefficients, then it is “pseudorandom”, and we get what we want from the pseudorandomness. Otherwise, *A* has some “linear structure,” and we can take advantage of it to reduce to a simpler instance of our original problem.

This *“either we get what we want or we reduce to a simpler case”* proof strategy has been used in some constructions of extractors, but it seems such a good idea that it may apply more widely, and one should keep it mind.

What about lower bounds? Excellent question. There is a very simple construction due to Behrend of a subset of size N/exp(sqrt(log N)) of {1,…,N} that has no length-3 progression, so N must be super-polynomial in 1/d when we prove the theorem over the integers (or over Z_{N} with prime N, the two cases are essentially equivalent). Note that if you try to find a large set with no length-3 progression using the probabilistic method, you will not go very far. This is a *better than random* explicit construction, a rarity in extremal combinatorics. By the way, Behrend’s construction is very simple (here is a half-page description), it is *60 year old*, and nobody has been able to improve it ever since.

What about lower bounds for F^{n}_{3}? There is no known construction of a large subset avoiding length-3 progressions. The best lower bounds are of the form c^{n}, with c<3. Is there a (3-o(1))^{n} lower bound? This is a **major open question**. If there were a (2.999)^{n} *upper bound*, then it would be a stunning result, because it would show that the business of translating proofs from finite fields to the integers using Bohr sets cannot be applied as a black box, but it works on some proofs and does not work on other proofs.

Thank you for these detailed, accessible accounts of amazing math!