The Inverse Conjecture for the Gowers Norms

This semester, the MSRI is having a special semester devoted to additive combinatorics and ergodic theory.

Additive combinatorics is the branch of extremal combinatorics where the objects of interest are sets of integers and, more generally, subsets of abelian groups (rather than graphs or set systems) and where the properties of interest are formulated in terms of linear equations (rather than in terms of cuts, partitions, intersections, and so on). Lately, it has been quite fascinating for a convergence of methods from “classical” combinatorics, from analysis and from ergodic theory. I have often written about it here because the combinatorial and analytical techniques of additive combinatorics have been useful in a number of different computer science applications (related to probabilistically checkable proof constructions, property testing, and pseudorandomness), and computer scientists are becoming more and more interested in the subject, contributing their own perspective to it.

In all this, the exact role of ergodic theory (and the possible applications of ergodic-theoretic techniques in complexity theory) has been a bit of mystery to me, and perhaps to other computer scientists too. Very nicely, the MSRI special program started this week with a series of tutorials to introduce the connections between ergodic theory and additive combinatorics.

All talks are (or will soon be) online, and the talks by Terry Tao are especially recommended, because he explains, using very simple examples, how one goes about converting concrete, finitary and quantitative statements into equivalent abstract infinitary statements, which are in turn amenable to ergodic-theoretic techniques.

Today, Tamar Ziegler discussed the very recent proof, by Vitaly Bergelson, Terry Tao, and herself, of the inverse conjecture for Gowers norms in finite fields, a proof that uses ergodic-theoretic techniques.

But, you may object, didn’t we know that the inverse conjecture for Gowers norms is false? Well, the counterexample of Lovett, Meshulam, and Samorodnitsky (independently discovered by Green and Tao) refutes the following conjecture: “Suppose f: {\mathbb F}_p^n \rightarrow {\mathbb C} is a bounded function such that || f||_{U^k} \geq \delta; then there is a \epsilon = \epsilon(\delta,p,k) independent of n and an n-variate degree-(k-1) polynomial q over {\mathbb F}_p such that f(x) and e^{-2\pi i q(x)/p} have correlation at least \epsilon.”

This is refuted in the p=2,k=4 case by taking f(x_1,\ldots,x_n) = (-1)^{S_4(x_1,\ldots,x_n)}, where S_4 is the symmetric polynomial of degree 4. While f has o(1) (indeed, exponentially small) correlation with all functions of the form (-1)^{q(x_1,\ldots,x_n)}, where q is a degree-3 polynomial, the norm || f||_{U^4} is a constant.

The statement proved by Bergelson, Tao, and Ziegler is “Suppose f: {\mathbb F}_p^n \rightarrow {\mathbb C} is a bounded function such that || f||_{U^k} \geq \delta; then there is a \epsilon = \epsilon(\delta,p,k) independent of n and a bounded n-variate degree-(k-1) ‘polynomial function’ Q: {\mathbb F}_p^n \rightarrow {\mathbb C} such that f(x) and Q(x) have correlation at least \epsilon.”

What is, then, a ‘polynomial function’ of degree d? It is a function bounded by 1 in absolute value, and such that for every d+1 directions h_1,\ldots, h_{d+1}, if one takes d+1 Gowers derivatives in such directions one always gets the constant-1 function. In other words, Q is a ‘polynomial function’ of degree k-1 if |Q(x)| \leq 1 for every x, and one has || Q||_{U^k}=1. Interestingly, these functions are a proper superclass of the functions of the form e^{\pi i q(x)/p} with q being a polynomial over {\mathbb F}_p.

In the concrete p=2 case, one may construct such a function by letting q be a polynomial in the ring, say, {\mathbb Z}_8, and then having Q(x) = \omega^{q(x)}, where \omega is a primitive eight-root of unity. Indeed, this is the type of degree-3 polynomial function that is correlated with (-1)^{S_4(x)}.

[Apologies for not defining all the technical terms and the context; the reader can find some background in this post and following the links there.]

What is, then, ergodic theory, and what does it have to do with finitary combinatorial problems? I am certainly the wrong person to ask, but I shall try to explain the little that I have understood in the next post(s).

Bounded Independence and DNFs

In 1990, Linial and Nisan conjectured that every AC0 function is “fooled” by polylog-wise independence. This means that if f: \{ 0,1 \}^n \rightarrow \{ 0,1\} is computed by a circuit of size S and depth d, and X is a distribution over \{ 0,1 \}^n such that every k bits are mutually independent and uniformly distributed (such a distribution is called k-wise independent), then

(1) \ \ \ {\mathbb E} f (X) \approx {\mathbb E} f(U_n)

where U_n is the uniform distribution over \{0,1\}^n, provided that k is around (\log S)^{O(d)}. (Maybe even around (\log S)^{d-1}.)

Nearly 20 years later, not much is known about this conjecture, except for a breakthrough result by Louay Bazzi from last year. Bazzi proves that (1) holds if f is computed by a size-S depth-2 circuit (that is, a CNF or a DNF) and X is a O( (\log S)^2)-wise independent distribution. This settles the depth-2 case of the Linial-Nisan conjecture.

A result of Linial, Mansour and Nisan plays an important role in the proof. The result is that if f is computed by a DNF of size S, then there is a function g: \{0,1\}^n \rightarrow {\mathbb R} that can be written as a degree-d polynomial with d= O((\log S)^2) and that is “close” to f, in the sense that

(2) \ \ \ {\mathbb E}  [ |f-g|^2]

is small.

Now, if g is a degree-d polynomial, and X is a d-wise independent distribution, we have

{\mathbb E} g(X) = {\mathbb E} g(U_n)

and we also have {\mathbb E} g(U_n) \approx  {\mathbb E} f(U_n) from (2). If we also had {\mathbb E} g(X) \approx {\mathbb E} f(X), then we would be done.

Unfortunately, we have no way to relate {\mathbb E} g(X) and {\mathbb E} f(X), because the fact that f and g are close does not imply that they have similar expectations under the distribution X, which could be concentrated on the few inputs on which f and g are very different.

Although the Linial-Mansour-Nisan result does not immediately give us the desired result, it is intuitive that it is useful to be able to approximate a DNF by low-degree polynomials, and to know that low-degree polynomials are perfectly “fooled” by d-wise independent distributions.

Bazzi observes that the following stronger polynomial approximation property is sufficient to prove that a distribution fools a given function.

Suppose that f:\{0,1\}^n \rightarrow \{0,1\} is a function, and that we are able to find two degree-d polynomials g_L,g_U such that for every x \in \{0,1\}^n we have

(3) \ \ \ g_L(x) \leq f(x) \leq g_U(x)


(4) \ \ \ {\mathbb E} g_U - g_L  \leq \epsilon

where the expectation in (4) is taken under the uniform distribution. Then it is easy to see that if X is d-wise independent, then

{\mathbb E} g_L(X) \leq {\mathbb E} f(X) \leq {\mathbb E} g_U(X)


{\mathbb E} g_L(U_n) \leq {\mathbb E} f(U_n) \leq {\mathbb E} g_U(U_n)

so that

(5) \ \ \ | {\mathbb E} f(U_n)  - {\mathbb E} f(X) | \leq \epsilon

It is also possible to use linear programming duality that argue that if f is fooled by d-wise independent distributions, then degree-d polynomials g_L,g_U as above must exist.

Bazzi then shows that, given a size-S DNF f, it is possible to find degree-d polynomials g_L,g_U as above with d = O((\log S)^2).

Finding such polynomials, in turn, is reduced rather cleverly to the task of finding a single polynomial g such that {\mathbb E} |f-g|^2 is small, as in Linial-Mansour-Nisan and, in addition, g has the property that whenever f(x)=0 then we always must have g(x)=0.

The construction of this “one sided” low-degree approximation of f is the bulk of Bazzi’s paper. A couple of days ago, Razborov has posted an exceedingly neat construction, that he explains in a page and a half! (Theorem 7 in the link above.)

It is rather appropriate that this has happened during the Banff Workshop on Analytic Tools in Computational Complexity, and tomorrow Scott Aaronson will give an impromptu talk on Razborov’s proof, hot from the presses.