Suppose we have a length-increasing function {G: \{ 0,1 \}^{n-1} \rightarrow \{ 0,1 \}^n}, which we think of as a pseudorandom generator mapping a shorter seed into a longer output.

Then the distribution of {G(z)} for a random seed {z} is not uniform (in particular, it is concentrated on at most {2^{n-1}} of the {2^n} elements of {\{ 0,1 \}^n}). We say that a statistical test {D: \{ 0,1 \}^n \rightarrow \{ 0,1 \}} has advantage {\epsilon} in distinguishing the output of {G} from the uniform distribution if

\displaystyle   \left| \mathop{\mathbb P}_{z \in \{ 0,1 \}^{n-1}} [D(G(z)) =1 ] - \mathop{\mathbb P}_{x \in \{ 0,1 \}^{n}} [D(x) =1 ] \right| \geq \epsilon \ \ \ \ \ (1)

If the left-hand side of (1) is at most {\epsilon} for every {D} computable by a circuit of size {S}, then we say that {G} is {\epsilon}-pseudorandom against circuits of size {S}, or that it is an {(S,\epsilon)}-secure pseudorandom generator.

How secure can a pseudorandom generator possibly be? This question (if we make no assumption on the efficiency of {G}) is related to the question in the previous post on approximating a boolean function via small circuits. Both questions, in fact, are special cases of the question of how much an arbitrary real-valued function must correlate with functions computed by small circuits, which is answered in a new paper with Anindya De and Madhur Tulsiani.

Suppose that we are given an arbitrary {G: \{ 0,1 \}^{n-1} \rightarrow \{ 0,1 \}^n} and we want to construct a circuit {D} of size {S} such that (1) holds for the largest possible value of {\epsilon}.

A simple strategy is to hard-wire {S} images of {G} into the circuit. If {D} receives in input one of the strings from such set, it outputs 1, otherwise it output 0. We see that {D} outputs 1 with probability at least {S/2^{n-1}} given an output of the generator (because each possible output of the generator is produced with probability at least {1/2^{n-1}}), but {D} accepts with probability {S/2^n} given a random string from {\{ 0,1 \}^n}. This simple construction shows that size {O(n S)} is sufficient to achieve {\epsilon = S/2^{n}} in (1), so that {(S,o(nS/2^n))}-secure generators cannot exist.

What kind of lower bounds can we prove? If we fix a test {D}, we call {p:= \mathop{\mathbb P}_{x\in \{ 0,1 \}^n} [ D(x)=1]}, and we pick a function {G: \{ 0,1 \}^{n-1} \rightarrow \{ 0,1 \}^n} uniformly at random among all functions {\{ 0,1 \}^{n-1} \rightarrow \{ 0,1 \}^n}, then by Chernoff bounds we have

\displaystyle  \mathop{\mathbb P}_{G: \{ 0,1 \}^{n-1} \rightarrow \{ 0,1 \}^n} \left[ \left| \mathop{\mathbb P}_{z\in \{ 0,1 \}^{n-1} } [ D(G(z)) = 1] - p \right| \geq \epsilon \right] \leq e^{-2\epsilon^2 2^{n-1}}

Since there are {2^{O(S\log S)}} circuits of size {S}, a union bound tells us that there are {(S,O(\sqrt{S\log S/2^n }))}-secure pseudorandom generator {G:\{ 0,1 \}^{n-1} \rightarrow \{ 0,1 \}^n}.

It turns out that this bound is tight, that is, for every {G}, we can always construct a circuit of size {S} whose advantage is about {\sqrt{S/2^n}}.

The following more general result (whose proof I will discuss in later posts) subsumes both the question of constructing distinguishers for pseudorandom generators and the question of approximating boolean functions.

Theorem 1 Let {g: \{ 0,1 \}^n \rightarrow {\mathbb R}} be any real-valued function. Then for every {S} there is a a function {c: \{ 0,1 \}^n \rightarrow \{0,1\}} computable by a circuit of size at most {S} such that

\displaystyle  \sum_x (-1)^{c(x)} \cdot g(x) \geq \Omega\left( \sqrt{\frac {S}{2^n}}\right) \cdot\sum_x \left| g(x) \right|

To see why it subsumes both questions, let us start from the boolean function approximation problem. If {f: \{ 0,1 \}^n \rightarrow \{ 0,1 \}} is a boolean function, we can define {g(x) := (-1)^{f(x)}}, and the theorem gives us a function {c} computable by a circuit of size {O(\epsilon^2 2^n)} such that

\displaystyle  \sum_x (-1)^{c(x)} \cdot (-1)^{f(x)} \geq \epsilon \cdot \sum_x \left| (-1)^{f(x)} \right| = \epsilon 2^n

Dividing both sides by {2^n} we have

\displaystyle  \mathop{\mathbb E}_{x\sim \{ 0,1 \}^n} (-1)^{c(x)} \cdot (-1)^{f(x)} \geq \epsilon

and the left-hand side is {\mathop{\mathbb P} [ c(x)=f(x)] - \mathop{\mathbb P} [ c(x)\neq f(x)] = 2\mathop{\mathbb P}[c(x)=f(x)]-1}, so

\displaystyle  \mathop{\mathbb P} [ c(x) = f(x) ] \geq \frac 12 + \frac \epsilon 2

Regarding the problem of distinguishing the output of a generator from the uniform distribution, set

\displaystyle  g(x) := \mathop{\mathbb P}_{z\in \{ 0,1 \}^{n-1}} [ G(z) = x] - \frac 1 {2^n}

get a function {c(x)}, computable by a circuit of size {O(\epsilon^2 2^n)} such that

\displaystyle  \sum_x (-1)^{c(x)} \cdot g(x) \geq \epsilon \cdot \sum_x \left| g(x) \right|

and note that {\sum_x |g(x)|} is the {\ell_1} distance of the distribution of outputs of the generator from the uniform distribution, that is, at least 1, while the left-hand side is twice the distinguishing probability of {c}, so

\displaystyle  \mathop{\mathbb P}_{z\in \{ 0,1 \}^{n-1}} [ c(G(z)) =1] - \mathop{\mathbb P}_{x\in \{ 0,1 \}^n} [c(x)=1] \geq \frac \epsilon 2

About these ads