We want to prove that a dense subset of a pseudorandom set is indistinguishable from a truly dense set.

Here is an example of what this implies: take a pseudorandom generator of output length , choose in an arbitrary way a 1% fraction of the possible seeds of the generator, and run the generator on a random seed from this restricted set; then the output of the generator is indistinguishable from being a random element of a set of size .

(Technically, the theorem states the existence of a distribution of min-entropy , but one can also get the above statement by standard “rounding” techniques.)

As a slightly more general example, if you have a generator mapping a length- seed into an output of length , and is a distribution of seeds of min-entropy at least , then is indistinguishable from a distribution of min-entropy . (This, however, works only if .)

It’s time to give a formal statement. Recall that we say that a distribution is -dense in a distribution if

(Of course I should say “random variable” instead of “distribution,” or write things differently, but we are between friends here.)

We want to say that if is a class of tests, is pseudorandom according to a moderately larger class , and is -dense in , then there is a distribution that is indistinguishable from according to and that is -dense in the uniform distribution.

The Green-Tao-Ziegler proof of this result becomes slightly easier in our setting of interest (where contains boolean functions) and gives the following statement:

Theorem (Green-Tao-Ziegler, Boolean Case)

Let be a finite set, be a class of functions , be a distribution over , be a -dense distribution in , be given.Suppose that for every that is -dense in there is an such that

Then there is a function of the form where and such that

Readers should take a moment to convince themselves that the above statement is indeed saying that if is pseudorandom then has a model , by equivalently saying that if no model exists then is not pseudorandom.

The problem with the above statement is that can be arbitrary and, in particular, it can have circuit complexity exponential in , and hence in .

In our proof, instead, is a linear threshold function, realizable by a size circuit. Another improvement is that .

Here is the proof by Omer Reingold, Madhur Tulsiani, Salil Vadhan, and me. Assume is closed under complement (otherwise work with the closure of ), then the assumption of the theorem can be restated without absolute values

for every that is -dense in there is an such that

We begin by finding a “universal distinguisher.”

Claim

There is a function which is a convex combination of functions from and such that that for every that is -dense in ,

This can be proved via the min-max theorem for two-players games, or, equivalently, via linearity of linear programming, or, like an analyst would say, via the Hahn-Banach theorem.

Let now be the set of elements of where is largest. We must have

(1)

which implies that there must be a threshold such that

(2)

So we have found a boolean distinguisher between and . Next,

we claim that the same distinguisher works between and .

By the density assumption, we have

and since contains exactly a fraction of , and since the condition always fails outside of (why?), we then have

and so

(3)

Now, it’s not clear what the complexity of is: it could be a convex combination involving *all* the functions in . However, by Chernoff bounds, there must be functions with such that is well approximated by for all $x$ but for an exceptional set having density less that, say, , according to both and .

Now and are distinguished by the predicate , which is just a linear threshold function applied to a small set of functions from , as promised.

Actually I have skipped an important step: outside of the exceptional set, is going to be *close* to but not identical, and this could lead to problems. For example, in (3) might typically be larger than only by a tiny amount, and might consistently underestimate in . If so, could be a completely different quantity from .

To remedy this problem, we note that, from (1), we can also derive the more “robust” distinguishing statement

(2′)

from which we get

(3′)

And now we can be confident that even replacing with an approximation we still get a distinguisher.

The statement needed in number-theoretic applications is stronger in a couple of ways. One is that we would like to contain bounded functions rather than boolean-valued functions. Looking back at our proof, this makes no difference. The other is that we would like to be a function of the form rather than a general composition of functions . This we can achieve by approximating a threshold function by a polynomial of degree using the Weierstrass theorem, and then choose the most distinguishing monomial. This gives a proof of the following statement, which is equivalent to Theorem 7.1 in the Tao-Ziegler paper.

Theorem (Green-Tao-Ziegler, General Case)

Let be a finite set, be a class of functions , be a distribution over , be a -dense distribution in , be given.Suppose that for every that is -dense in there is an such that

Then there is a function of the form where and such that

In this case, we too lose an exponential factor. Our proof, however, has some interest even in the number-theoretic setting because it is somewhat simpler than and genuinely different from the original one.

Thank you Luca. Wonderful post. In the number theoretic setting do you mean that your method may have some interests i.e., qualitatively rather than quantitatively?

It’s too early to say if any direct application is going to come out in the number-theoretic setting. Conceivably, if it were possible to directly establish pseudorandomness of the almost-primes against bounded threshold functions of “dual functions” then there could even be quantitative improvements.

But, broadly speaking, interaction between arithmetic combinatorialists and complexity theorists is sure to be very fruitful to both. Ultimately, any argument that exploits consequences of pseudorandomness involves a reduction, and we have 25 years of experience, and a number of tools, in designing such reductions. Conversely, they have developed their own tools and intuition, that we should take the time to understand and internalize.

This week’s seminars at the IAS were a nice model of how things may evolve. I spoke on Monday, in the complexity seminar, about these results, and Terry Tao spoke on Tuesday, in the arithmetic combinatorics seminar, on and a notion akin to “local decodability” for graph properties.