In the last post we introduced the following problem: we are given a length-increasing function, the hardest case being a function whose output is one bit longer than the input, and we want to construct a generator such that the *advantage* or *distinguishing probability* of

is as large as possible relative to the circuit complexity of .

I will show how to achieve advantage with a circuit of size . Getting rid of the suboptimal factor of is a bit more complicated. These results are in this paper.

Let us review some well known facts about distinguishability of distributions. If and are two distributions over , then the advantage of the best distinguisher between and is called the *statistical distance* (or *total variation distance*) between and , and is denoted , hence

and it is easy to see that the statistical distance between two distributions is half their distance, if we think of them as vectors, that is

If is a length-increasing function, then the distribution of outputs of the function (given a random input seed ) has statistical distance at least from the uniform distribution, as seen by using the distinguisher defined as

because outputs 1 with probability one given an output of , but with probability at most given a uniformly sampled element of .

We are asking the question of whether there necessarily has to be a computable by a relatively small circuit which has non-trivially good advantage in distinguishing from the uniform distribution.

It has been well known since the 1980s (see the introduction of this survey paper by Oded Goldreich) that for every distribution

where the maximum is taken over all the -linear functions , and where is the uniform distribution over . This fact, which has an easy proof using Fourier analysis, is useful in showing that certain distributions over -bit strings are nearly uniform, because (up to some loss which is tolerable if is small) it reduces this task to the task of showing that the xors of all subsets of the bits of the string are nearly unbiased.

Here we are interested in the implication that if is a length-increasing function then there exists a linear function such that

This means that one can achieve distinguishing probability with a circuit of size , which has the promised size .

How do we “scale up” this result? For example, how do we achieve distinguishing probability with a circuit of size ?

By analogy with our simplified proof of a result of Andreev, Clementi and Rolim that I described last week, we might try the following approach:

- partition into sets, each of size ;
- find the linear function that is the best distinguisher within each set;
- create a distinguisher that, on input , determines the set that belongs to, and then applies to the appropriate linear function.

The problem with this approach is how to define “best distinguisher within a set.” Our initial idea was to consider, for each set in the partition, the distribution versus the uniform distribution over , and then find the best linear distinguisher for each of these pairs of distributions. Unfortunately this approach runs into various difficulties if the distribution does not hit the various sets in the partition with approximately the same probability. Our solution was to define the partition using pairwise independent hashing, which would have the effect that would hit the various sets with approximately the same probability. (This is the solution that I presented when I gave talks about this result in Princeton and New York.)

Fortunately, there is a much easier and attractive way to proceed. Just partition into sets, each of size based on the value of the first bits. Then, for each set in the partition , find the affine function that maximizes

and then define the distinguisher

(where is the characteristic function of ) that, given , determines which set of the partition belongs to and then returns .

The distinguisher can clearly be evaluated by a circuit of size , so it remains to understand its distinguishing probability. The main claim is that for every set there is an affine function such that

and if we sum these inequalities over all the sets we get the distinguishing probability of on the left-hand side and times the distance of from the uniform distribution on the right-hand side. Then the distinguishing probability of is at least , and the sets were chosen to have size .

How does one prove (5)? More importantly, how does one begin to guess that an equation like (5) is true and that it provides the right “amortization” between the advantage within a block and the relative probabilities of hitting the block? And how is this any simpler than what we were doing before?

The answer is that everything becomes much simpler (and one does not have to reason about anything as messy as (5)) if we represent bits with instead of and we observe that our original goal is to find an efficient distinguisher such that

where and , because

and same for , so

Our goal can also be written as proving

and if we write and recall what we said about statistical distance and distance, this is equivalent to proving

Recalling that we are working with a partition of into sets , each of size , one would optimistically hope to do so by proving that for every set

Our proposed distinguisher computes an affine function within each set, so we would need to show that for every there is an affine function such that

Equation (7) is now in a form that is easy to prove via standard Fourier analytic arguments. Here we give a different proof which uses only the fact that affine functions form a pairwise independent family of functions and are closed under complementation.

Lemma 1Let be any function, and be a pairwise independent family of functions . Then there is a function such that

*Proof:* Using the pairwise independence of we compute

so there must exist an such that

and we have

from Cauchy-Schwarz.