Testing minor-closed properties in sparse graphs

Terry Tao and Gil Kalai have written on the mathematical work of Oded Schramm. Tao suggests a survey paper by Lawler as a source to learn more about the stochastic Loewner equation and its applications. Schramm was a plenary speaker at the 2006 ICM in Madrid, and his talk can be seen going to http://www.icm2006.org/video/ and clicking on “11th Session.” Unfortunately the site is very badly designed and, on my computer, it works neither with Firefox nor with Chrome. (It does work with IE.)

Oded has also worked in computer science, and here I would like to recall a lovely recent paper of his on property testing.

In graph property testing, we are interested in a graph property such as, say, 2-colorability, and we would like to develop very efficient algorithms, at the cost of delivering an approximate answer. The approximation takes the following form: if the graph is 2-colorable, then we want the algorithm to accept with high probability, and if the algorithm accepts with noticeable probablity, then it must be the case that the graph is at least “\epsilon-close” to 2-colorable. (A similar definition can be given for any other graph property, such as connectivity, planarity, expansion, 3-colorability, and so on.)

The notion of “closeness” can be defined in various ways, which affect the complexity of the problem.

If we deal with dense graphs, we usually define two n-vertex graphs to be “\epsilon-close” if one can be converted into the other by adding or removing at most \epsilon n^2 edges. In this “dense graph model” of property testing, the notion of closeness is loose enough that essentially every natural graph property has very efficient algorithms, whose running time depends only on \epsilon.

If we deal with sparse graphs, such as graphs whose maximum degree is d (think of d as a constant, while the number n of vertices goes to infinity), then we would say that two graphs are \epsilon-close if one can be converted into the other by adding or removing up to \epsilon d n edges. In this “bounded degree graph model” of property testing, some properties become much harder to test. While 2-colorability, and, in fact, k-colorability for every k, have testers of complexity poly(1/\epsilon) in the dense graph model, 2-colorability as complexity \Omega (\sqrt n) and O(\sqrt n polylog n) and 3-colorability has complexity \Omega(n) in the bounded degree model.

Also, some properties are trivial in the dense graph model and non-trivial in the bounded-degree one. Consider planarity: in the dense graph model, a graph is \epsilon-close to planar if and only if it is (\epsilon + o(1))-close to the empty graph, and so testing planarity is equivalent to testing emptiness, and thus trivial. In the bounded-degree model, it had been an open question to find a sub-linear time tester for planarity.

In STOC 2008, Benjamini, Schramm and Shapira prove that planarity is testable in the bounded-degree model with complexity that depends only on \epsilon, and independent of the size of the graph. The same result applies to any graph property defined in terms of excluded minors. Previously, very few properties were known to be testable with complexity dependent only on \epsilon in this model.

The proof applies more broadly to properties that are “hyper-finite,” meaning that graphs with the property can be broken up into finite-size components after removing a small fraction of edges. The result relies on a previous result by Schramm, studying “limits” of sequences of hyper-finite graphs using the kind of compactness techniques described by Terry Tao in his recent MSRI lectures.

Property testing and Szemeredi’s Theorem

After discussing Szemeredi’s Theorem and the analytical approaches of Roth and Gowers, let’s see some ideas and open problems in the combinatorial approach.

The following result is a good starting point.

Triangle Removal Lemma. For every \delta >0 there is a constant \epsilon(\delta) > 0 such that if G is an n-vertex graph with at most \epsilon(\delta)n^3 triangles, then it is possible to make G triangle-free by removing at most \delta n^2 edges.

This result follows easily from the Szemeredi Regularity Lemma, and it is a prototype of several results in the field of property testing.

Indeed, consider the problem of distinguishing triangle-free graphs from graphs that are not even close to being triangle-free. For the purpose of this example, we think of a graph as “close to being triangle-free” if it can be made triangle-free by removing at most \delta n^2 edges, for some small constant \delta > 0. Then the Lemma tells us that in a not-even-close graph we have at least \epsilon(\delta) n^3 triangles, and so a sample of O(1/\epsilon(\delta)) vertices is likely to contain a triangle. So here is an algorithm for the problem: pick at random O(1/\epsilon(\delta)) vertices and see if they induce a triangle. We note that:

  • The algorithm satisfies our requirement;
  • The running time is a constant independent of n and dependent only on \delta;
  • this all makes sense only if 1/\epsilon(\delta) grows moderately as a function of 1/\delta.

Let us now see that the Triangle Removal Lemma proves Szemeredi’s Theorem for sequences of length 3. As we discussed earlier, it is enough to prove the Theorem in groups of the type {\mathbf Z}_N, with N prime; we will do better and prove the Theorem in any abelian group. So let G be an abelian group of size N, an let A be a subset of G of size \delta N. Construct a tri-partite graph (X,Y,Z,E) where X,Y,Z are sets of vertices, each of size N, and each a copy of the group G. The set of edges is defined as follows:

  1. for x \in X and y \in Y, (x,y) is an edge if there is an a \in A such that x+a=y
  2. for x \in X and z \in Z, (x,z) is an edge if there is a b\in A such that x + b + b = z
  3. for y \in Y and z \in Z, (y,z) is an edge if there is a c \in A such that y+c=z

Now we notice that if x,y,z is a triangle, then there are a,b,c in A such that (after some cancellations) a+c = b+b, which means that a,b,c is an arithmetic progression of length 3 in the group G. In fact, we see that the number of triangles in the graph is precisely N times the number of triples (a,b,c) in arithmetic progression in A.

Consider now the N \cdot |A| = \delta N^2 triangles corresponding to the “trivial” progressions of the form a,a,a. (These are the triangles x,x+a,x+a+a.) We can see that these triangles are edge-disjoint, and so in order just to remove such triangles we have to remove from the graph at least \delta N^2 edges. So the Triangle Removal Lemma implies that there are at least \epsilon(\delta)N^3 triangles in the graph, and so at least \epsilon(\delta)N^2 triples in arithmetic progression in A. If N > \delta/\epsilon(\delta), then some of those arithmetic progressions must be non-trivial, and we have the length-3 case of Szemeredi’s theorem.

We see that both for the “property testing” application and for the Szemeredi Theorem application it is important to have good quantitative bounds on \epsilon(\delta). We know that, in Szemeredi’s theorem, N must be super-polynomial in 1/\delta, and so, in the Triangle Removal Lemma, 1/\epsilon(\delta) must be super-polynomial in 1/\delta.

The known bounds, however, are quite unreasonable: we only know how to bound 1/\epsilon(\delta) by a tower of exponentials whose height is poly(1/\delta). Unfortunately, such unreasonable bounds are unavoidable in any proof of the Triangle Removal Lemma based on Szemeredi’s Regularity Lemma. This is because, as proved by Gowers, such tower-of-exponentials bounds are necessary in the Regularity Lemma.

Can we find a different proof of the Triangle Removal Lemma that has only a singly- or doubly-exponential \epsilon(\delta)? That would be wonderful, both for property testing (because the proof would probably apply to other sub-graph homomorphism problems) and for additive combinatorics. Over the last year, I have found at least three such proofs. Unfortunately they were all fatally flawed.

Or, is there a more-than-exponential lower bound for the Triangle Removal Lemma? This would also be a major result: it would show that several results in property testing for dense graphs have no hope of being practical, and that there is a “separation” between the quantitative version of Szemeredi’s Theorem provable with analytical methods versus what can be proved with the above reduction. Besides, it’s not often that we have super-exponential lower bounds for natural problems, with Gowers’s result being a rare exception.

By the way, what about Szemeredi’s Theorem for longer sequences? For sequences of length k one needs a “k-clique removal lemma” for (k-1)-uniform hypergraphs, which in turn can be derived from the proper generalization of the Regularity Lemma to hypergraphs. This turns out to be quite complicated, and it has been accomplished only very recently in independent work by Nagle, Rödl and Schacht; and by Gowers. An alternative proof has been given by Tao. The interested reader can find more in expository paper by Gowers and Tao.

What about Szemeredi’s own proof? It does use the Regularity Lemma, which was conceived and proved specifically for this application, and it involves a reduction to a graph-theoretic problem. I have to admit that I don’t know much else.