The "Complexity Theory" Proof of a Theorem of Green-Tao-Ziegler

We want to prove that a dense subset of a pseudorandom set is indistinguishable from a truly dense set.

Here is an example of what this implies: take a pseudorandom generator of output length n, choose in an arbitrary way a 1% fraction of the possible seeds of the generator, and run the generator on a random seed from this restricted set; then the output of the generator is indistinguishable from being a random element of a set of size \frac 1 {100} \cdot 2^n.

(Technically, the theorem states the existence of a distribution of min-entropy n - \log_2 100, but one can also get the above statement by standard “rounding” techniques.)

As a slightly more general example, if you have a generator G mapping a length-t seed into an output of length n, and Z is a distribution of seeds of min-entropy at least t-d, then G(Z) is indistinguishable from a distribution of min-entropy n-d. (This, however, works only if d = O(\log n).)

It’s time to give a formal statement. Recall that we say that a distribution D is \delta-dense in a distribution R if

\forall x. Pr[R=x] \geq \delta \cdot Pr [D=x]

(Of course I should say “random variable” instead of “distribution,” or write things differently, but we are between friends here.)

We want to say that if F is a class of tests, R is pseudorandom according to a moderately larger class F', and D is \delta-dense in R, then there is a distribution M that is indistinguishable from D according to F and that is \delta-dense in the uniform distribution.

The Green-Tao-Ziegler proof of this result becomes slightly easier in our setting of interest (where F contains boolean functions) and gives the following statement:

Theorem (Green-Tao-Ziegler, Boolean Case)
Let \Sigma be a finite set, F be a class of functions f:\Sigma \to \{0,1\}, R be a distribution over \Sigma, D be a \delta-dense distribution in R, \epsilon>0 be given.

Suppose that for every M that is \delta-dense in U_\Sigma there is an f\in F such that
| Pr[f(D)=1] - Pr[f(M)] = 1| >\epsilon

Then there is a function h:\Sigma \rightarrow \{0,1\} of the form h(x) = g(f_1(x),\ldots,f_k(x)) where k = poly(1/\epsilon,1/\delta) and f_i \in F such that
| Pr [h(R)=1] - Pr [ h(U_\Sigma) =1] | > poly(\epsilon,\delta)

Readers should take a moment to convince themselves that the above statement is indeed saying that if R is pseudorandom then D has a model M, by equivalently saying that if no model M exists then R is not pseudorandom.

The problem with the above statement is that g can be arbitrary and, in particular, it can have circuit complexity exponential in k, and hence in 1/\epsilon.

In our proof, instead, g is a linear threshold function, realizable by a O(k) size circuit. Another improvement is that k=poly(1/\epsilon,\log 1/\delta).

Here is the proof by Omer Reingold, Madhur Tulsiani, Salil Vadhan, and me. Assume F is closed under complement (otherwise work with the closure of F), then the assumption of the theorem can be restated without absolute values

for every M that is \delta-dense in U_\Sigma there is an f\in F such that
Pr[f(D)=1] - Pr[f(M) = 1] >\epsilon

We begin by finding a “universal distinguisher.”

There is a function \bar f:\Sigma \rightarrow [0,1] which is a convex combination of functions from F and such that that for every M that is \delta-dense in U_\Sigma,
E[\bar f(D)] - E[\bar f(M)]  >\epsilon

This can be proved via the min-max theorem for two-players games, or, equivalently, via linearity of linear programming, or, like an analyst would say, via the Hahn-Banach theorem.

Let now S be the set of \delta |\Sigma| elements of \Sigma where \bar f is largest. We must have
(1) E[\bar f(D)] - E[\bar f(U_S)] >\epsilon
which implies that there must be a threshold t such that
(2) Pr[\bar f(D)\geq t] - Pr[\bar f(U_S) \geq t]  >\epsilon
So we have found a boolean distinguisher between D and U_S. Next,
we claim that the same distinguisher works between R and U_\Sigma.

By the density assumption, we have
Pr[\bar f(R)\geq t] \geq \delta \cdot Pr[\bar f(D) \geq t]

and since S contains exactly a \delta fraction of \Sigma, and since the condition \bar f(x) \geq t always fails outside of S (why?), we then have
Pr[\bar f(U_\Sigma)\geq t] = \delta \cdot Pr[\bar f(U_S) \geq t]
and so
(3) Pr[\bar f(R)\geq t] - Pr[\bar f(U_\Sigma) \geq t]  >\delta \epsilon

Now, it’s not clear what the complexity of \bar f is: it could be a convex combination involving all the functions in F. However, by Chernoff bounds, there must be functions f_1,\ldots,f_k with k=poly(1/\epsilon,\log 1/\delta) such that \bar f(x) is well approximated by \sum_i f_i(x) / k for all $x$ but for an exceptional set having density less that, say, \delta\epsilon/10, according to both R and U_\Sigma.

Now R and U_\Sigma are distinguished by the predicate \sum_{i=1}^k f_i(x) \geq tk, which is just a linear threshold function applied to a small set of functions from F, as promised.

Actually I have skipped an important step: outside of the exceptional set, \sum_i f_i(x)/k is going to be close to \bar f(x) but not identical, and this could lead to problems. For example, in (3) \bar f(R) might typically be larger than t only by a tiny amount, and \sum_i f_i(x)/k might consistently underestimate \bar f in R. If so, Pr [ \sum_{i=1}^k f_i(R) \geq tk ] could be a completely different quantity from Pr [\bar f(R)\geq t].

To remedy this problem, we note that, from (1), we can also derive the more “robust” distinguishing statement
(2′) Pr[\bar f(D)\geq t+\epsilon/2] - Pr[\bar f(U_S) \geq t]  >\epsilon/2
from which we get
(3′) Pr[\bar f(R)\geq t+\epsilon/2] - Pr[\bar f(U_\Sigma) \geq t]  >\delta \epsilon/2

And now we can be confident that even replacing \bar f with an approximation we still get a distinguisher.

The statement needed in number-theoretic applications is stronger in a couple of ways. One is that we would like F to contain bounded functions f:\Sigma \rightarrow [0,1] rather than boolean-valued functions. Looking back at our proof, this makes no difference. The other is that we would like h(x) to be a function of the form h(x) = \Pi_{i=1}^k f_i(x) rather than a general composition of functions f_i. This we can achieve by approximating a threshold function by a polynomial of degree poly(1/\epsilon,1/\delta) using the Weierstrass theorem, and then choose the most distinguishing monomial. This gives a proof of the following statement, which is equivalent to Theorem 7.1 in the Tao-Ziegler paper.

Theorem (Green-Tao-Ziegler, General Case)
Let \Sigma be a finite set, F be a class of functions f:\Sigma \to [0,1], R be a distribution over \Sigma, D be a \delta-dense distribution in R, \epsilon>0 be given.

Suppose that for every M that is \delta-dense in U_\Sigma there is an f\in F such that
| Pr[f(D)=1] - Pr[f(M)] = 1| >\epsilon

Then there is a function h:\Sigma \rightarrow \{0,1\} of the form h(x) = \Pi_{i=1}^k f_i(x) where k = poly(1/\epsilon,1/\delta) and f_i \in F such that
| Pr [f(R)=1] - Pr [ f(U_\Sigma) =1] | > exp(-poly(1/\epsilon,1/\delta))

In this case, we too lose an exponential factor. Our proof, however, has some interest even in the number-theoretic setting because it is somewhat simpler than and genuinely different from the original one.

2 thoughts on “The "Complexity Theory" Proof of a Theorem of Green-Tao-Ziegler

  1. Thank you Luca. Wonderful post. In the number theoretic setting do you mean that your method may have some interests i.e., qualitatively rather than quantitatively?

  2. It’s too early to say if any direct application is going to come out in the number-theoretic setting. Conceivably, if it were possible to directly establish pseudorandomness of the almost-primes against bounded threshold functions of “dual functions” then there could even be quantitative improvements.

    But, broadly speaking, interaction between arithmetic combinatorialists and complexity theorists is sure to be very fruitful to both. Ultimately, any argument that exploits consequences of pseudorandomness involves a reduction, and we have 25 years of experience, and a number of tools, in designing such reductions. Conversely, they have developed their own tools and intuition, that we should take the time to understand and internalize.

    This week’s seminars at the IAS were a nice model of how things may evolve. I spoke on Monday, in the complexity seminar, about these results, and Terry Tao spoke on Tuesday, in the arithmetic combinatorics seminar, on and a notion akin to “local decodability” for graph properties.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s