# Efficiently Correlating with a Real-valued Function and Breaking PRGs

Suppose we have a length-increasing function ${G: \{ 0,1 \}^{n-1} \rightarrow \{ 0,1 \}^n}$, which we think of as a pseudorandom generator mapping a shorter seed into a longer output.

Then the distribution of ${G(z)}$ for a random seed ${z}$ is not uniform (in particular, it is concentrated on at most ${2^{n-1}}$ of the ${2^n}$ elements of ${\{ 0,1 \}^n}$). We say that a statistical test ${D: \{ 0,1 \}^n \rightarrow \{ 0,1 \}}$ has advantage ${\epsilon}$ in distinguishing the output of ${G}$ from the uniform distribution if $\displaystyle \left| \mathop{\mathbb P}_{z \in \{ 0,1 \}^{n-1}} [D(G(z)) =1 ] - \mathop{\mathbb P}_{x \in \{ 0,1 \}^{n}} [D(x) =1 ] \right| \geq \epsilon \ \ \ \ \ (1)$

If the left-hand side of (1) is at most ${\epsilon}$ for every ${D}$ computable by a circuit of size ${S}$, then we say that ${G}$ is ${\epsilon}$-pseudorandom against circuits of size ${S}$, or that it is an ${(S,\epsilon)}$-secure pseudorandom generator.

How secure can a pseudorandom generator possibly be? This question (if we make no assumption on the efficiency of ${G}$) is related to the question in the previous post on approximating a boolean function via small circuits. Both questions, in fact, are special cases of the question of how much an arbitrary real-valued function must correlate with functions computed by small circuits, which is answered in a new paper with Anindya De and Madhur Tulsiani.

Suppose that we are given an arbitrary ${G: \{ 0,1 \}^{n-1} \rightarrow \{ 0,1 \}^n}$ and we want to construct a circuit ${D}$ of size ${S}$ such that (1) holds for the largest possible value of ${\epsilon}$.

A simple strategy is to hard-wire ${S}$ images of ${G}$ into the circuit. If ${D}$ receives in input one of the strings from such set, it outputs 1, otherwise it output 0. We see that ${D}$ outputs 1 with probability at least ${S/2^{n-1}}$ given an output of the generator (because each possible output of the generator is produced with probability at least ${1/2^{n-1}}$), but ${D}$ accepts with probability ${S/2^n}$ given a random string from ${\{ 0,1 \}^n}$. This simple construction shows that size ${O(n S)}$ is sufficient to achieve ${\epsilon = S/2^{n}}$ in (1), so that ${(S,o(nS/2^n))}$-secure generators cannot exist.

What kind of lower bounds can we prove? If we fix a test ${D}$, we call ${p:= \mathop{\mathbb P}_{x\in \{ 0,1 \}^n} [ D(x)=1]}$, and we pick a function ${G: \{ 0,1 \}^{n-1} \rightarrow \{ 0,1 \}^n}$ uniformly at random among all functions ${\{ 0,1 \}^{n-1} \rightarrow \{ 0,1 \}^n}$, then by Chernoff bounds we have $\displaystyle \mathop{\mathbb P}_{G: \{ 0,1 \}^{n-1} \rightarrow \{ 0,1 \}^n} \left[ \left| \mathop{\mathbb P}_{z\in \{ 0,1 \}^{n-1} } [ D(G(z)) = 1] - p \right| \geq \epsilon \right] \leq e^{-2\epsilon^2 2^{n-1}}$

Since there are ${2^{O(S\log S)}}$ circuits of size ${S}$, a union bound tells us that there are ${(S,O(\sqrt{S\log S/2^n }))}$-secure pseudorandom generator ${G:\{ 0,1 \}^{n-1} \rightarrow \{ 0,1 \}^n}$.

It turns out that this bound is tight, that is, for every ${G}$, we can always construct a circuit of size ${S}$ whose advantage is about ${\sqrt{S/2^n}}$.

The following more general result (whose proof I will discuss in later posts) subsumes both the question of constructing distinguishers for pseudorandom generators and the question of approximating boolean functions.

Theorem 1 Let ${g: \{ 0,1 \}^n \rightarrow {\mathbb R}}$ be any real-valued function. Then for every ${S}$ there is a a function ${c: \{ 0,1 \}^n \rightarrow \{0,1\}}$ computable by a circuit of size at most ${S}$ such that $\displaystyle \sum_x (-1)^{c(x)} \cdot g(x) \geq \Omega\left( \sqrt{\frac {S}{2^n}}\right) \cdot\sum_x \left| g(x) \right|$

To see why it subsumes both questions, let us start from the boolean function approximation problem. If ${f: \{ 0,1 \}^n \rightarrow \{ 0,1 \}}$ is a boolean function, we can define ${g(x) := (-1)^{f(x)}}$, and the theorem gives us a function ${c}$ computable by a circuit of size ${O(\epsilon^2 2^n)}$ such that $\displaystyle \sum_x (-1)^{c(x)} \cdot (-1)^{f(x)} \geq \epsilon \cdot \sum_x \left| (-1)^{f(x)} \right| = \epsilon 2^n$

Dividing both sides by ${2^n}$ we have $\displaystyle \mathop{\mathbb E}_{x\sim \{ 0,1 \}^n} (-1)^{c(x)} \cdot (-1)^{f(x)} \geq \epsilon$

and the left-hand side is ${\mathop{\mathbb P} [ c(x)=f(x)] - \mathop{\mathbb P} [ c(x)\neq f(x)] = 2\mathop{\mathbb P}[c(x)=f(x)]-1}$, so $\displaystyle \mathop{\mathbb P} [ c(x) = f(x) ] \geq \frac 12 + \frac \epsilon 2$

Regarding the problem of distinguishing the output of a generator from the uniform distribution, set $\displaystyle g(x) := \mathop{\mathbb P}_{z\in \{ 0,1 \}^{n-1}} [ G(z) = x] - \frac 1 {2^n}$

get a function ${c(x)}$, computable by a circuit of size ${O(\epsilon^2 2^n)}$ such that $\displaystyle \sum_x (-1)^{c(x)} \cdot g(x) \geq \epsilon \cdot \sum_x \left| g(x) \right|$

and note that ${\sum_x |g(x)|}$ is the ${\ell_1}$ distance of the distribution of outputs of the generator from the uniform distribution, that is, at least 1, while the left-hand side is twice the distinguishing probability of ${c}$, so $\displaystyle \mathop{\mathbb P}_{z\in \{ 0,1 \}^{n-1}} [ c(G(z)) =1] - \mathop{\mathbb P}_{x\in \{ 0,1 \}^n} [c(x)=1] \geq \frac \epsilon 2$

1. Thanks