Dense Subsets of Pseudorandom Sets: The Paper(s)

Green, Tao and Ziegler, in their works on patterns in the primes, prove a general result of the following form: if X is a set, R is a, possibly very sparse, “pseudorandom” susbset of X, and D is a dense subset of R, then D may be “modeled” by a large set M which has the same density in X as the density of D in R.

They use this result with X being the integers in a large interval \{ 1,\ldots, N\}, R being the “almost-primes” in X (integers with no small factor), and D being the primes in X. Since the almost-primes can be proved to be “pseudorandom” in a fairly strong sense, and since the density of the primes in the almost-primes is at least an absolute constant, it follows that the primes are “indistinguishable” from a large set M containing a constant fraction of all integers. Since such large sets are known to contain arbitrarily long arithmetic progressions, as proved by Szemeredi, Green and Tao are able to prove that the primes too must contain arbitrarily long arithmetic progressions. Such large sets are also known to contain arbitrarily long “polynomial progressions,” as proved by Bergelson and Leibman, and this allows Tao and Ziegler to argue that the primes too much contain arbitrarily long polynomial progressions.

(The above account is not completely accurate, but it is not lying too much.)

As announced last October here, and here, Omer Reingold, Madhur Tulsiani, Salil Vadhan and I found a new proof of this “dense model” theorem, which uses the min-max theorem of game theory (or, depending on the language that you prefer to use, the duality of linear programming or the Hahn Banach theorem) and was inspired by Nisan’s proof of the Impagliazzo hard-core set theorem. In complexity-theoretic applications of the theorem, our reduction has polynomial complexity, while the previous work incurred an exponential loss.

As discussed here and here, we also show how to use the Green-Tao-Ziegler techniques of “iterative partitioning” to give a different proof of Impagliazzo’s theorem.

After long procrastination, we recently wrote up a paper about these results.

In the Fall, we received some feedback from additive combinatorialists that while our proof of the Green-Tao-Ziegler result was technically simpler than the original one, the language we used was hard to follow. (That’s easy to believe, because it took us a while to understand the language in which the original proof was written.) We then wrote an expository note of the proof in the analyst’s language. When we were about to release the paper and the note, we were contacted by Tim Gowers, who, last Summer, had independently discovered a proof of the Green-Tao-Ziegler results via the Hahn-Banach theorem, essentially with the same argument. (He also found other applications of the technique in additive combinatorics. The issue of polynomial complexity, which does not arise in his applications, is not considered.)

Gowers announced his results in April at a talk at the Fields institute in Toronto. (Audio and slides are available online.)

Gowers’ paper already contains the proof presented in the “analyst language,” making our expository note not so useful any more; we have still posted it anyways because, by explaining how one translates from one notation to the other, it can be a short starting point for the computer scientist who is interested in trying to read Gowers’ paper, or for the combinatorialist who is interested in trying to read our paper.

The "Complexity Theory" Proof of a Theorem of Green-Tao-Ziegler

We want to prove that a dense subset of a pseudorandom set is indistinguishable from a truly dense set.

Here is an example of what this implies: take a pseudorandom generator of output length n, choose in an arbitrary way a 1% fraction of the possible seeds of the generator, and run the generator on a random seed from this restricted set; then the output of the generator is indistinguishable from being a random element of a set of size \frac 1 {100} \cdot 2^n.

(Technically, the theorem states the existence of a distribution of min-entropy n - \log_2 100, but one can also get the above statement by standard “rounding” techniques.)

As a slightly more general example, if you have a generator G mapping a length-t seed into an output of length n, and Z is a distribution of seeds of min-entropy at least t-d, then G(Z) is indistinguishable from a distribution of min-entropy n-d. (This, however, works only if d = O(\log n).)

It’s time to give a formal statement. Recall that we say that a distribution D is \delta-dense in a distribution R if

\forall x. Pr[R=x] \geq \delta \cdot Pr [D=x]

(Of course I should say “random variable” instead of “distribution,” or write things differently, but we are between friends here.)

We want to say that if F is a class of tests, R is pseudorandom according to a moderately larger class F', and D is \delta-dense in R, then there is a distribution M that is indistinguishable from D according to F and that is \delta-dense in the uniform distribution.

The Green-Tao-Ziegler proof of this result becomes slightly easier in our setting of interest (where F contains boolean functions) and gives the following statement:

Theorem (Green-Tao-Ziegler, Boolean Case)
Let \Sigma be a finite set, F be a class of functions f:\Sigma \to \{0,1\}, R be a distribution over \Sigma, D be a \delta-dense distribution in R, \epsilon>0 be given.

Suppose that for every M that is \delta-dense in U_\Sigma there is an f\in F such that
| Pr[f(D)=1] - Pr[f(M)] = 1| >\epsilon

Then there is a function h:\Sigma \rightarrow \{0,1\} of the form h(x) = g(f_1(x),\ldots,f_k(x)) where k = poly(1/\epsilon,1/\delta) and f_i \in F such that
| Pr [h(R)=1] - Pr [ h(U_\Sigma) =1] | > poly(\epsilon,\delta)

Readers should take a moment to convince themselves that the above statement is indeed saying that if R is pseudorandom then D has a model M, by equivalently saying that if no model M exists then R is not pseudorandom.

The problem with the above statement is that g can be arbitrary and, in particular, it can have circuit complexity exponential in k, and hence in 1/\epsilon.

In our proof, instead, g is a linear threshold function, realizable by a O(k) size circuit. Another improvement is that k=poly(1/\epsilon,\log 1/\delta).

Here is the proof by Omer Reingold, Madhur Tulsiani, Salil Vadhan, and me. Assume F is closed under complement (otherwise work with the closure of F), then the assumption of the theorem can be restated without absolute values

for every M that is \delta-dense in U_\Sigma there is an f\in F such that
Pr[f(D)=1] - Pr[f(M) = 1] >\epsilon

We begin by finding a “universal distinguisher.”

Claim
There is a function \bar f:\Sigma \rightarrow [0,1] which is a convex combination of functions from F and such that that for every M that is \delta-dense in U_\Sigma,
E[\bar f(D)] - E[\bar f(M)]  >\epsilon

This can be proved via the min-max theorem for two-players games, or, equivalently, via linearity of linear programming, or, like an analyst would say, via the Hahn-Banach theorem.

Let now S be the set of \delta |\Sigma| elements of \Sigma where \bar f is largest. We must have
(1) E[\bar f(D)] - E[\bar f(U_S)] >\epsilon
which implies that there must be a threshold t such that
(2) Pr[\bar f(D)\geq t] - Pr[\bar f(U_S) \geq t]  >\epsilon
So we have found a boolean distinguisher between D and U_S. Next,
we claim that the same distinguisher works between R and U_\Sigma.

By the density assumption, we have
Pr[\bar f(R)\geq t] \geq \delta \cdot Pr[\bar f(D) \geq t]

and since S contains exactly a \delta fraction of \Sigma, and since the condition \bar f(x) \geq t always fails outside of S (why?), we then have
Pr[\bar f(U_\Sigma)\geq t] = \delta \cdot Pr[\bar f(U_S) \geq t]
and so
(3) Pr[\bar f(R)\geq t] - Pr[\bar f(U_\Sigma) \geq t]  >\delta \epsilon

Now, it’s not clear what the complexity of \bar f is: it could be a convex combination involving all the functions in F. However, by Chernoff bounds, there must be functions f_1,\ldots,f_k with k=poly(1/\epsilon,\log 1/\delta) such that \bar f(x) is well approximated by \sum_i f_i(x) / k for all $x$ but for an exceptional set having density less that, say, \delta\epsilon/10, according to both R and U_\Sigma.

Now R and U_\Sigma are distinguished by the predicate \sum_{i=1}^k f_i(x) \geq tk, which is just a linear threshold function applied to a small set of functions from F, as promised.

Actually I have skipped an important step: outside of the exceptional set, \sum_i f_i(x)/k is going to be close to \bar f(x) but not identical, and this could lead to problems. For example, in (3) \bar f(R) might typically be larger than t only by a tiny amount, and \sum_i f_i(x)/k might consistently underestimate \bar f in R. If so, Pr [ \sum_{i=1}^k f_i(R) \geq tk ] could be a completely different quantity from Pr [\bar f(R)\geq t].

To remedy this problem, we note that, from (1), we can also derive the more “robust” distinguishing statement
(2′) Pr[\bar f(D)\geq t+\epsilon/2] - Pr[\bar f(U_S) \geq t]  >\epsilon/2
from which we get
(3′) Pr[\bar f(R)\geq t+\epsilon/2] - Pr[\bar f(U_\Sigma) \geq t]  >\delta \epsilon/2

And now we can be confident that even replacing \bar f with an approximation we still get a distinguisher.

The statement needed in number-theoretic applications is stronger in a couple of ways. One is that we would like F to contain bounded functions f:\Sigma \rightarrow [0,1] rather than boolean-valued functions. Looking back at our proof, this makes no difference. The other is that we would like h(x) to be a function of the form h(x) = \Pi_{i=1}^k f_i(x) rather than a general composition of functions f_i. This we can achieve by approximating a threshold function by a polynomial of degree poly(1/\epsilon,1/\delta) using the Weierstrass theorem, and then choose the most distinguishing monomial. This gives a proof of the following statement, which is equivalent to Theorem 7.1 in the Tao-Ziegler paper.

Theorem (Green-Tao-Ziegler, General Case)
Let \Sigma be a finite set, F be a class of functions f:\Sigma \to [0,1], R be a distribution over \Sigma, D be a \delta-dense distribution in R, \epsilon>0 be given.

Suppose that for every M that is \delta-dense in U_\Sigma there is an f\in F such that
| Pr[f(D)=1] - Pr[f(M)] = 1| >\epsilon

Then there is a function h:\Sigma \rightarrow \{0,1\} of the form h(x) = \Pi_{i=1}^k f_i(x) where k = poly(1/\epsilon,1/\delta) and f_i \in F such that
| Pr [f(R)=1] - Pr [ f(U_\Sigma) =1] | > exp(-poly(1/\epsilon,1/\delta))

In this case, we too lose an exponential factor. Our proof, however, has some interest even in the number-theoretic setting because it is somewhat simpler than and genuinely different from the original one.

Dense Subsets of Pseudorandom Sets

The Green-Tao theorem states that the primes contain arbitrarily long arithmetic progressions; its proof can be, somewhat inaccurately, broken up into the following two steps:

Thm1: Every constant-density subset of a pseudorandom set of integers contains arbitrarily long arithmetic progressions.

Thm2: The primes have constant density inside a pseudorandom set.

Of those, the main contribution of the paper is the first theorem, a “relative” version of Szemeredi’s theorem. In turn, its proof can be (even more inaccurately) broken up as

Thm 1.1: For every constant density subset D of a pseudorandom set there is a “model” set M that has constant density among the integers and is indistinguishable from D.

Thm 1.2 (Szemeredi) Every constant density subset of the integers contains arbitrarily long arithmetic progressions, and many of them.

Thm 1.3 A set with many long arithmetic progressions cannot be indistinguishable from a set with none.

Following this scheme is, of course, easier said than done. One wants to work with a definition of pseudorandomness that is weak enough that (2) is provable, but strong enough that the notion of indistinguishability implied by (1.1) is in turn strong enough that (1.3) holds. From now on I will focus on (1.1), which is a key step in the proof, though not the hardest.

Recently, Tao and Ziegler proved that the primes contain arbitrarily long “polynomial progressions” (progressions where the increments are given by polynomials rather than linear functions, as in the case of arithmetic progressions). Their paper contains a very clean formulation of (1.1), which I will now (accurately, this time) describe. (It is Theorem 7.1 in the paper. The language I use below is very different but equivalent.)

We fix a finite universe \Sigma; this could be \{ 0,1\}^n in complexity-theoretic applications or {\mathbb Z}/N{\mathbb Z} in number-theoretic applications. Instead of working with subsets of \Sigma, it will be more convenient to refer to probability distributions over \Sigma; if S is a set, then U_S is the uniform distribution over S. We also fix a family F of “easy” function f: \Sigma \rightarrow [0,1]. In a complexity-theoretic applications, this could be the set of boolean functions computed by circuits of bounded size. We think of two distributions X,Y as being \epsilon-indistinguishable according to F if for every function f\in F we have

| E [f(X)] - E[f(Y)] | \leq \epsilon

and we think of a distribution as pseudorandom if it is indistinguishable from the uniform distribution U_\Sigma. (This is all standard in cryptography and complexity theory.)

Now let’s define the natural analog of “dense subset” for distributions. We say that a distribution A is \delta-dense in B if for every x\in \Sigma we have

Pr [ B=x] \geq \delta Pr [A=x]

Note that if B=U_T and A=U_S for some sets S,T, then A is \delta-dense in B if and only if S\subseteq T and |S| \geq \delta |T|.

So we want to prove the following:

Theorem (Green, Tao, Ziegler)
Fix a family F of tests and an \epsilon>0; then there is a “slightly larger” family F' and an \epsilon'>0 such that if R is an \epsilon'-pseudorandom distribution according to F' and D is \delta-dense in R, then there is a distribution M that is \delta-dense in U_\Sigma and that is \epsilon-indistinguishable from D according to F.

[The reader may want to go back to (1.1) and check that this is a meaningful formalization of it, up to working with arbitrary distributions rather than sets. This is in fact the “inaccuracy” that I referred to above.]

In a complexity-theoretic setting, we would like to say that if F is defined as all functions computable by circuits of size at most s, then \epsilon' should be poly (\epsilon,\delta) and F' should contain only functions computable by circuits of size s\cdot poly(1/\epsilon,1/\delta). Unfortunately, if one follows the proof and makes some simplifications asuming F contains only boolean functions, one sees that F' contains functions of the form g(x) = h(f_1(x),\ldots,f_k(x)), where f_i \in F, k = poly(1/\epsilon,1/\delta), and h could be arbitrary and, in general, have circuit complexity exponential in 1/\epsilon and 1/\delta. Alternatively one may approximate h() as a low-degree polynomial and take the “most distinguishing monomial.” This will give a version of the Theorem (which leads to the actual statement of Thm 7.1 in the Tao-Ziegler paper) where F' contains only functions of the form \Pi_{i=1}^k f_i(x), but then \epsilon' will be exponentially small in 1/\epsilon and 1/\delta. This means that one cannot apply the theorem to “cryptographically strong” notions of pseudorandomness and indistinguishability, and in general to any setting where 1/\epsilon and 1/\delta are super-logarithmic (not to mention super-linear).

This seems like an unavoidable consequence of the “finitary ergodic theoretic” technique of iterative partitioning and energy increment used in the proof, which always yields at least a singly exponential complexity.

Omer Reingold, Madhur Tulsiani, Salil Vadhan and I have recently come up with a different proof where both \epsilon' and the complexity of F' are polynomial. This gives, for example, a new characterization of the notion of pseudoentropy. Our proof is quite in the spirit of Nisan’s proof of Impagliazzo’s hard-core set theorem, and it is relatively simple. We can also deduce a version of the theorem where, as in Green-Tao-Ziegler, F' contains only bounded products of functions in F. In doing so, however, we too incur an exponential loss, but the proof is somewhat simpler and demonstrates the applicability of complexity-theoretic techniques in arithmetic combinatorics.

Since we can use (ideas from) a proof of the hard core set theorem to prove the Green-Tao-Ziegler result, one may wonder whether one can use the “finitary ergodic theory” techniques of iterative partitioning and energy increment to prove the hard-core set theorem. Indeed, we do this too. In our proof, the reduction loses a factor that is exponential in certain parameters (while other proofs are polynomial), but one also gets a more “constructive” result.

If readers can stomach it, a forthcoming post will describe the complexity-theory-style proof of the Green-Tao-Ziegler result as well as the ergodic-theory-style proof of the Impagliazzo hard core set theorem.