We want to prove that a dense subset of a pseudorandom set is indistinguishable from a truly dense set.

Here is an example of what this implies: take a pseudorandom generator of output length n, choose in an arbitrary way a 1% fraction of the possible seeds of the generator, and run the generator on a random seed from this restricted set; then the output of the generator is indistinguishable from being a random element of a set of size \frac 1 {100} \cdot 2^n.

(Technically, the theorem states the existence of a distribution of min-entropy n - \log_2 100, but one can also get the above statement by standard “rounding” techniques.)

As a slightly more general example, if you have a generator G mapping a length-t seed into an output of length n, and Z is a distribution of seeds of min-entropy at least t-d, then G(Z) is indistinguishable from a distribution of min-entropy n-d. (This, however, works only if d = O(\log n).)

It’s time to give a formal statement. Recall that we say that a distribution D is \delta-dense in a distribution R if

\forall x. Pr[R=x] \geq \delta \cdot Pr [D=x]

(Of course I should say “random variable” instead of “distribution,” or write things differently, but we are between friends here.)

We want to say that if F is a class of tests, R is pseudorandom according to a moderately larger class F', and D is \delta-dense in R, then there is a distribution M that is indistinguishable from D according to F and that is \delta-dense in the uniform distribution.

The Green-Tao-Ziegler proof of this result becomes slightly easier in our setting of interest (where F contains boolean functions) and gives the following statement:

Theorem (Green-Tao-Ziegler, Boolean Case)
Let \Sigma be a finite set, F be a class of functions f:\Sigma \to \{0,1\}, R be a distribution over \Sigma, D be a \delta-dense distribution in R, \epsilon>0 be given.

Suppose that for every M that is \delta-dense in U_\Sigma there is an f\in F such that
| Pr[f(D)=1] - Pr[f(M)] = 1| >\epsilon

Then there is a function h:\Sigma \rightarrow \{0,1\} of the form h(x) = g(f_1(x),\ldots,f_k(x)) where k = poly(1/\epsilon,1/\delta) and f_i \in F such that
| Pr [h(R)=1] - Pr [ h(U_\Sigma) =1] | > poly(\epsilon,\delta)

Readers should take a moment to convince themselves that the above statement is indeed saying that if R is pseudorandom then D has a model M, by equivalently saying that if no model M exists then R is not pseudorandom.

The problem with the above statement is that g can be arbitrary and, in particular, it can have circuit complexity exponential in k, and hence in 1/\epsilon.

In our proof, instead, g is a linear threshold function, realizable by a O(k) size circuit. Another improvement is that k=poly(1/\epsilon,\log 1/\delta).

Here is the proof by Omer Reingold, Madhur Tulsiani, Salil Vadhan, and me. Assume F is closed under complement (otherwise work with the closure of F), then the assumption of the theorem can be restated without absolute values

for every M that is \delta-dense in U_\Sigma there is an f\in F such that
Pr[f(D)=1] - Pr[f(M) = 1] >\epsilon

We begin by finding a “universal distinguisher.”

Claim
There is a function \bar f:\Sigma \rightarrow [0,1] which is a convex combination of functions from F and such that that for every M that is \delta-dense in U_\Sigma,
E[\bar f(D)] - E[\bar f(M)]  >\epsilon

This can be proved via the min-max theorem for two-players games, or, equivalently, via linearity of linear programming, or, like an analyst would say, via the Hahn-Banach theorem.

Let now S be the set of \delta |\Sigma| elements of \Sigma where \bar f is largest. We must have
(1) E[\bar f(D)] - E[\bar f(U_S)] >\epsilon
which implies that there must be a threshold t such that
(2) Pr[\bar f(D)\geq t] - Pr[\bar f(U_S) \geq t]  >\epsilon
So we have found a boolean distinguisher between D and U_S. Next,
we claim that the same distinguisher works between R and U_\Sigma.

By the density assumption, we have
Pr[\bar f(R)\geq t] \geq \delta \cdot Pr[\bar f(D) \geq t]

and since S contains exactly a \delta fraction of \Sigma, and since the condition \bar f(x) \geq t always fails outside of S (why?), we then have
Pr[\bar f(U_\Sigma)\geq t] = \delta \cdot Pr[\bar f(U_S) \geq t]
and so
(3) Pr[\bar f(R)\geq t] - Pr[\bar f(U_\Sigma) \geq t]  >\delta \epsilon

Now, it’s not clear what the complexity of \bar f is: it could be a convex combination involving all the functions in F. However, by Chernoff bounds, there must be functions f_1,\ldots,f_k with k=poly(1/\epsilon,\log 1/\delta) such that \bar f(x) is well approximated by \sum_i f_i(x) / k for all $x$ but for an exceptional set having density less that, say, \delta\epsilon/10, according to both R and U_\Sigma.

Now R and U_\Sigma are distinguished by the predicate \sum_{i=1}^k f_i(x) \geq tk, which is just a linear threshold function applied to a small set of functions from F, as promised.

Actually I have skipped an important step: outside of the exceptional set, \sum_i f_i(x)/k is going to be close to \bar f(x) but not identical, and this could lead to problems. For example, in (3) \bar f(R) might typically be larger than t only by a tiny amount, and \sum_i f_i(x)/k might consistently underestimate \bar f in R. If so, Pr [ \sum_{i=1}^k f_i(R) \geq tk ] could be a completely different quantity from Pr [\bar f(R)\geq t].

To remedy this problem, we note that, from (1), we can also derive the more “robust” distinguishing statement
(2′) Pr[\bar f(D)\geq t+\epsilon/2] - Pr[\bar f(U_S) \geq t]  >\epsilon/2
from which we get
(3′) Pr[\bar f(R)\geq t+\epsilon/2] - Pr[\bar f(U_\Sigma) \geq t]  >\delta \epsilon/2

And now we can be confident that even replacing \bar f with an approximation we still get a distinguisher.

The statement needed in number-theoretic applications is stronger in a couple of ways. One is that we would like F to contain bounded functions f:\Sigma \rightarrow [0,1] rather than boolean-valued functions. Looking back at our proof, this makes no difference. The other is that we would like h(x) to be a function of the form h(x) = \Pi_{i=1}^k f_i(x) rather than a general composition of functions f_i. This we can achieve by approximating a threshold function by a polynomial of degree poly(1/\epsilon,1/\delta) using the Weierstrass theorem, and then choose the most distinguishing monomial. This gives a proof of the following statement, which is equivalent to Theorem 7.1 in the Tao-Ziegler paper.

Theorem (Green-Tao-Ziegler, General Case)
Let \Sigma be a finite set, F be a class of functions f:\Sigma \to [0,1], R be a distribution over \Sigma, D be a \delta-dense distribution in R, \epsilon>0 be given.

Suppose that for every M that is \delta-dense in U_\Sigma there is an f\in F such that
| Pr[f(D)=1] - Pr[f(M)] = 1| >\epsilon

Then there is a function h:\Sigma \rightarrow \{0,1\} of the form h(x) = \Pi_{i=1}^k f_i(x) where k = poly(1/\epsilon,1/\delta) and f_i \in F such that
| Pr [f(R)=1] - Pr [ f(U_\Sigma) =1] | > exp(-poly(1/\epsilon,1/\delta))

In this case, we too lose an exponential factor. Our proof, however, has some interest even in the number-theoretic setting because it is somewhat simpler than and genuinely different from the original one.

About these ads