After discussing Szemeredi’s Theorem and the analytical approaches of Roth and Gowers, let’s see some ideas and open problems in the combinatorial approach.

The following result is a good starting point.

**Triangle Removal Lemma.** For every there is a constant such that if is an -vertex graph with at most triangles, then it is possible to make triangle-free by removing at most edges.

This result follows easily from the Szemeredi Regularity Lemma, and it is a prototype of several results in the field of property testing.

Indeed, consider the problem of distinguishing *triangle-free* graphs from graphs that are *not even close* to being triangle-free. For the purpose of this example, we think of a graph as “close to being triangle-free” if it can be made triangle-free by removing at most edges, for some small constant . Then the Lemma tells us that in a not-even-close graph we have at least triangles, and so a sample of vertices is likely to contain a triangle. So here is an algorithm for the problem: pick at random vertices and see if they induce a triangle. We note that:

- The algorithm satisfies our requirement;
- The running time is a
*constant* independent of and dependent only on ;
- this all makes sense only if grows moderately as a function of .

Let us now see that the Triangle Removal Lemma proves Szemeredi’s Theorem for sequences of length 3. As we discussed earlier, it is enough to prove the Theorem in groups of the type , with prime; we will do better and prove the Theorem in any abelian group. So let be an abelian group of size , an let be a subset of of size . Construct a tri-partite graph where are sets of vertices, each of size , and each a copy of the group . The set of edges is defined as follows:

- for and , is an edge if there is an such that
- for and , is an edge if there is a such that
- for and , is an edge if there is a such that

Now we notice that if is a triangle, then there are in such that (after some cancellations) , which means that is an arithmetic progression of length 3 in the group . In fact, we see that the number of triangles in the graph is precisely times the number of triples in arithmetic progression in .

Consider now the triangles corresponding to the “trivial” progressions of the form . (These are the triangles .) We can see that these triangles are *edge-disjoint*, and so in order just to remove such triangles we have to remove from the graph at least edges. So the Triangle Removal Lemma implies that there are at least triangles in the graph, and so at least triples in arithmetic progression in . If , then some of those arithmetic progressions must be non-trivial, and we have the length-3 case of Szemeredi’s theorem.

We see that both for the “property testing” application and for the Szemeredi Theorem application it is important to have good quantitative bounds on . We know that, in Szemeredi’s theorem, must be super-polynomial in , and so, in the Triangle Removal Lemma, must be super-polynomial in .

The known bounds, however, are quite unreasonable: we only know how to bound by a tower of exponentials whose height is . Unfortunately, such unreasonable bounds are unavoidable in any proof of the Triangle Removal Lemma based on Szemeredi’s Regularity Lemma. This is because, as proved by Gowers, such tower-of-exponentials bounds are necessary in the Regularity Lemma.

Can we find a different proof of the Triangle Removal Lemma that has only a singly- or doubly-exponential ? That would be wonderful, both for property testing (because the proof would probably apply to other sub-graph homomorphism problems) and for additive combinatorics. Over the last year, I have found at least three such proofs. Unfortunately they were all fatally flawed.

Or, is there a more-than-exponential lower bound for the Triangle Removal Lemma? This would also be a major result: it would show that several results in property testing for dense graphs have no hope of being practical, and that there is a “separation” between the quantitative version of Szemeredi’s Theorem provable with analytical methods versus what can be proved with the above reduction. Besides, it’s not often that we have super-exponential lower bounds for natural problems, with Gowers’s result being a rare exception.

By the way, what about Szemeredi’s Theorem for longer sequences? For sequences of length one needs a “-clique removal lemma” for -uniform hypergraphs, which in turn can be derived from the proper generalization of the Regularity Lemma to hypergraphs. This turns out to be quite complicated, and it has been accomplished only very recently in independent work by Nagle, Rödl and Schacht; and by Gowers. An alternative proof has been given by Tao. The interested reader can find more in expository paper by Gowers and Tao.

What about Szemeredi’s own proof? It does use the Regularity Lemma, which was conceived and proved specifically for this application, and it involves a reduction to a graph-theoretic problem. I have to admit that I don’t know much else.