As mentioned before, the MSRI is devoting this semester to connections between additive combinatorics and ergodic theory.

One of the generally exciting things about additive combinatorics is the confluence of techniques from analysis, combinatorics, and ergodic theory. One of the specifically exciting things to me has been the applicability of the analytic and the combinatorial techniques to computer science. It makes sense that we should make some effort to understand the ergodic-theoretic techniques too, just in case.

So what is ergodic theory? Here is what I learned from wikipedia. The general idea seems that one has a system which is in a certain state which belongs to a possibly huge and complex state space, and the system evolves deterministically in discrete time; one would like to have a general idea of what states the system goes through in the long run, if the start state is in a certain set.

So one has a set X (the “states”), which could be a discrete set (though these are not the cases where the theory says very interesting things) like {\mathbb Z}_N or a continuous one, like say, the interval [0,1]. One also has a measure \mu on X, which assigns a “size” \mu(S) to subsets S of X; for example we could have \mu(S) = |S| when X is {\mathbb Z}_N, or \mu(S) could be the Lebesgue measure of S when X is [0,1]. In most interesting cases, including [0,1], it is not possible to assign a measure \mu(S) to every subset of X and still satisfy the minimal axioms that a function must obey to be called a measure (things like: if A and B are disjoint, then \mu(A) + \mu(B) = \mu(A \cup B)); in particular, it is not possible to define a uniform measure on the set of infinite bit-strings (which is the same as [0,1]) so that every subset has a measure, as seen by the solution of this puzzle. In such a case, one has a collection \mathbb B of subsets of X, and \mu is only defined on elements of \mathbb B, which are of course called “measurable.” It is important that \mathbb B be closed under complementation and under countable union, that is, that \mathbb B be a \sigma-algebra.

This notion makes sense even in the finite setting. If X is finite, then one can easily see that if \mathbb B is a \sigma-algebra then there is a partition B_1,\ldots,B_k of X such that \mathbb B is precisely the collection of sets that can be obtained as unions of blocks of the partition. (Usually, however, in the finite setting one defines \mu for every subset, and so \mathbb B contains all subsets and is generated by the discrete partition.)

Finally, one has an operator T: X \rightarrow X that describes the evolution of the system. For example, if X= {\mathbb Z}_N, we could have T(x) = x+1 \bmod N. It is important for the theory that T be “measure-preserving” that is that \mu(S) = \mu(T(S)) for every measurable set S. (In particular, this requires that if S is measurable then T(S) is measurable, where T(S) is the set \{ T(s) : s\in S \}. If we are in the finite setting, and \mathbb B does not contain all sets, then this measurability condition is quite interesting, because it says that if B_1,\ldots,B_k is the partition that defines the \sigma-algebra, then for every i we must have that T(B_i) is itself a union of blocks.)

The whole thing (X,{\mathbb B},\mu,T) is a “measure-preserving dynamical system,” and is the object of study of ergodic theory. Typically, for a starting point x we would like to understand the sequence x, Tx, T^2x,\ldots,T^nx,\ldots. An interesting case is when the sequence is “all over the space,” meaning that if A is a set containing an \alpha fraction of the measure of X, then roughly \alpha n of the first n elements of the sequence land in A. More formally, this is the case when

(1) \ \ \ \displaystyle \lim_{N \rightarrow \infty} \frac 1N \sum_{n=1}^N 1_A(T^{n}(x)) = \frac {\mu(A)}{\mu(X)}

The ergodic theorem states that if (X,{\mathbb B},\mu,T) is ergodic, then (1) happens for all x, except possibly an exceptional set of measure zero. The property of being ergodic is a sort of “expansion” property of T; it is the requirement that if \mu(X)> \mu(S)>0 then \mu(S \Delta T(S)) >0, then is T, when applied to a set, always generate a noticeable fraction of new elements. In fact, the conclusion of the ergodic theorem is stronger, because one does not just get (1), but also

(1') \ \ \ \displaystyle \lim_{N \rightarrow \infty} \frac 1N \sum_{n=1}^N f(T^{n}(x)) = \frac {1}{\mu(X)} \cdot \int_X f(x) d\mu(x)

for nice enough functions f().

So far, this seems very remote from combinatorics, except for the feeling that the ergodic theorem is somewhat analogous to discrete theorems on convergence of random walks on expanders (I don’t know if there is a precise connection).

Going back to Szemeredi’s theorem, say, for sequences of length 3, we want to prove that for every \delta, and for every large enough N, if A \subseteq {\mathbb Z}_N has size at least \delta N, then A contains length-3 progressions. Indeed, known proofs give something stronger: that for every \delta there is an \epsilon such that for every subset A \subseteq {\mathbb Z}_N,

(2) \ \ \  \displaystyle {\mathbb E}_{x,y \in X} 1_A(x) \cdot 1_A(x+y) \cdot 1_A(x+2y) \geq \epsilon

This means that there are at least \epsilon N^2 - N length-3 progressions x,x+y,x+2y where y\neq 0 and the progression is contained in A. (In particular, if N is large enough, then A has a length-3 progression.)

If we define T(x) := x+ 1 \pmod N, then we can rewrite (2) as

(3) \ \ \  \displaystyle \frac 1N \sum_{n=1}^N {\mathbb E}_{x\in X} 1_A(x) \cdot 1_A(T^n(x) \cdot 1_A(T^{2n}(x)) \geq \epsilon

This begins to look like an expression from ergodic theory.

Now, here is a result by Furstenberg which looks similar to (3), but without epsilons and deltas:

If (X,{\mathbb B},\mu,T) is a measure-preserving system, A \subseteq X, and \mu(A) > 0, then

(4) \ \ \  \displaystyle \limsup_{N\rightarrow \infty} \frac 1N \sum_{n=1}^N \int_{x\in X} 1_A(x) \cdot 1_A(T^n(x) \cdot 1_A(T^{2n}(x)) d\mu(x) > 0

It takes some work to show that Furstenberg’s result implies Szemeredi’s theorem. Instead of trying to explain how this works, I refer you to Terry Tao’s excellent exposition in parts (3) and (4) of this post.

Broadly, there are two points to this approach: one is to use “compactness” arguments to reduce a finitary statement with epsilons and deltas to a cleaner infinitary statement; the other is to abstract away most of the structure of the given problem, and to come up with a more general and abstract statement that can be easier to prove by “inductive” arguments. (Here “induction” is in quotes because it is often an infinitary kind of induction that relies on Zorn’s Lemma.)

About these ads