# CS276 Lecture 12: Goldreich-Levin

Scribed by Jonah Sherman

Summary

Today we prove the Goldreich-Levin theorem.

1. Goldreich-Levin Theorem

We use the notation

$\displaystyle \langle x,r \rangle := \sum_i x_ir_i \bmod 2 \ \ \ \ \ (1)$

Theorem 1 (Goldreich and Levin) Let ${f: \{ 0,1 \}^n \rightarrow \{ 0,1 \}^n}$ be a permutation computable in time ${r}$. Suppose that ${A}$ is an algorithm of complexity ${t}$ such that

$\displaystyle \mathop{\mathbb P}_{x,r} [ A(f(x),r) = \langle x,r \rangle ] \geq \frac 12 + \epsilon \ \ \ \ \ (2)$

Then there is an algorithm ${A'}$ of complexity at most ${O((t+r) \epsilon^{-2}n^{O(1)})}$ such that

$\displaystyle \mathop{\mathbb P}_{x} [ A'(f(x)) = x ] \geq \frac \epsilon 4$

Last time we proved the following partial result.

Lemma 2 (Goldreich-Levin Algorithm — Weak Version) Suppose we have access to a function ${H: \{ 0,1 \}^n \rightarrow \{ 0,1 \}}$ such that, for some unknown ${x}$, we have

$\displaystyle \mathop{\mathbb P}_{r \in \{ 0,1 \}^n} [ H(r) = \langle x,r \rangle ] \geq \frac 78 \ \ \ \ \ (3)$

where ${x\in \{ 0,1 \}^n}$ is an unknown string.

Then there is an algorithm GLW that runs in time ${O(n^2\log n)}$ and makes ${O(n\log n)}$ oracle queries into ${H}$ and, with probability at least ${1- \frac{1}{n}}$, outputs ${x}$.

This gave us a proof of a variant of the Goldreich-Levin Theorem in which the right-hand-side in (2) was ${\frac {15}{16}}$. We could tweak the proof Lemma 2 so that the right-hand-side of (4) is ${\frac 34 + \epsilon}$, leading to proving a variant of the Goldreich-Levin Theorem in which the right-hand-side in (2) is also ${\frac 34 + \epsilon}$.

We need, however, the full Goldreich-Levin Theorem in order to construct a pseudorandom generator, and so it seems that we have to prove a strengthening of Lemma 2 in which the right-hand-side in (4) is ${\frac 12 + \epsilon}$.

Unfortunately such a stronger version of Lemma 2 is just false: for any two different ${x,x'\in \{ 0,1 \}^n}$ we can construct an ${H}$ such that

$\displaystyle \mathop{\mathbb P}_{r\sim \{ 0,1 \}^n} [ H(r) = \langle x,r \rangle ] = \frac 34$

and

$\displaystyle \mathop{\mathbb P} _{r\sim \{ 0,1 \}^n} [ H(r) = \langle x',r \rangle ] = \frac 34$

so no algorithm can be guaranteed to find ${x}$ given an arbitrary function ${H}$ such that ${\mathop{\mathbb P} [ H(r) = \langle x,r \rangle ] = \frac 34}$, because ${x}$ need not be uniquely defined by ${H}$.

We can, however, prove the following:

Lemma 3 (Goldreich-Levin Algorithm) Suppose we have access to a function ${H: \{ 0,1 \}^n \rightarrow \{ 0,1 \}}$ such that, for some unknown ${x}$, we have

$\displaystyle \mathop{\mathbb P}_{r \in \{ 0,1 \}^n} [ H(r) = \langle x,r \rangle ] \geq \frac 12 + \epsilon \ \ \ \ \ (4)$

where ${x\in \{ 0,1 \}^n}$ is an unknown string, and ${\epsilon>0}$ is given.

Then there is an algorithm ${GL}$ that runs in time ${O(n^2 \epsilon^{-4}\log n)}$, makes ${O(n\epsilon^{-4} \log n)}$ oracle queries into ${H}$, and outputs a set ${L \subseteq \{ 0,1 \}^n}$ such that ${|L| =O(\epsilon^{-2})}$ and with probability at least ${1/2}$, ${x \in L}$.

The Goldreich-Levin algorithm ${GL}$ has other interpretations (an algorithm that learns the Fourier coefficients of ${H}$, an algorithm that decodes the Hadamard code is sub-linear time) and various applications outside cryptography.

The Goldreich-Levin Theorem is an easy consequence of Lemma 3. Let ${A'}$ take input ${y}$ and then run the algorithm of Lemma 3 with ${H(r) = A(y, r)}$, yielding a list ${L}$. ${A'}$ then checks if ${f(x) = y}$ for any ${x \in L}$, and outputs it if one is found.

From the assumption that

$\displaystyle \mathop{\mathbb P}_{x,r} [ A(f(x),r)= \langle x, r \rangle] \geq \frac 12 + \epsilon$

it follows by Markov’s inequality (See Lemma 9 in the last lecture) that

$\displaystyle \mathop{\mathbb P}_x \left[ \mathop{\mathbb P}_r [A(f(x),r)=\langle x, r \rangle] \geq \frac 12 + \frac \epsilon 2 \right] \geq \frac \epsilon 2$

Let us call an ${x}$ such that ${\mathop{\mathbb P}_r [A(f(x),r)=\langle x, r \rangle] \geq \frac 12 + \frac \epsilon 2}$ a good ${x}$. If we pick ${x}$ at random and give ${f(x)}$ to the above algorithm, there is a probability at least ${\epsilon/2}$ that ${x}$ is good and, if so, there is a probability at least ${1/2}$ that ${x}$ is in the list. Therefore, there is a probability at least ${\epsilon/4}$ that the algorithm inverts ${f()}$, where the probability is over the choices of ${x}$ and over the internal randomness of the algorithm.

2. The Goldreich-Levin Algorithm

In this section we prove Lemma 3.

We are given an oracle ${H()}$ such that ${H(r)=\langle x, r\rangle}$ for an ${1/2+\epsilon}$ fraction of the ${r}$. Our goal will be to use ${H()}$ to simulate an oracle that has agreement ${7/8}$ with ${\langle x, r \rangle}$, so that we can use the algorithm of Lemma 2 the previous section to find ${x}$. We perform this “reduction” by “guessing” the value of ${\langle x, r\rangle}$ at a few points.

We first choose ${k}$ random points ${r_1 \ldots r_k \in \{ 0,1 \}^n}$ where ${k = O(1 / \epsilon^2).}$ For the moment, let us suppose that we have “magically” obtained the values ${\langle x, r_1 \rangle, \ldots, \langle x, r_k \rangle}$. Then define ${H'(r)}$ as the majority value of:

$\displaystyle H(r + r_j) - \langle x, r_j \rangle \ \ j = 1, 2, \ldots, k \ \ \ \ \ (5)$

For each ${j}$, the above expression equals ${\langle x, r \rangle}$ with probability at least ${\frac{1}{2} + \epsilon}$ (over the choices of ${r_j}$) and by choosing ${k=O(1/\epsilon^2)}$ we can ensure that

$\displaystyle \mathop{\mathbb P}_{r,r_1,\ldots,r_k}\left [H'(r) = \langle x, r \rangle \right ] \ge \frac{31}{32}. \ \ \ \ \ (6)$

from which it follows that

$\displaystyle \mathop{\mathbb P}_{r_1,\ldots,r_k}\left [ \mathop{\mathbb P}_r \left [H'(r) = \langle x, r \rangle \right ] \ge \frac 78 \right ] \ge \frac{3}{4}. \ \ \ \ \ (7)$

Consider the following algorithm.

• Algorithm GL-First-Attempt
• pick ${r_1, \ldots, r_k \in \{ 0,1 \}^n}$ where ${k = O(1/\epsilon^2)}$
• for all ${b_1, \ldots, b_k \in \{ 0,1 \}}$
• define ${H'_{b_1 \ldots b_k}(r)}$ as majority of: ${ H(r + r_j) - b_j}$
• apply Algorithm GLW to ${H'_{b_1 \ldots b_t}}$
• return list

The idea behind this program is that we do not in fact know the values ${\langle x, r_j \rangle}$, but we can “guess” them by considering all choices for the bits ${b_j.}$ If ${H(r)}$ agrees with ${\langle x, r \rangle}$ for at least a ${1/2+\epsilon}$ fraction of the ${r}$s, then there is a probability at least ${3/4}$ that in one of the iteration we invoke algorithm GLW with a simulated oracle that has agreement ${7/8}$ with ${\langle x, r \rangle}$. Therefore, the final list contains ${x}$ with probability at least ${3/4 - 1/n > 1/2}$.

The obvious problem with this algorithm is that its running time is exponential in ${k = O(1/\epsilon^2)}$ and the resulting list may also be exponentially larger than the ${O(1/\epsilon^2)}$ bound promised by the Lemma.

To overcome these problems, consider the following similar algorithm.

• Algorithm GL
• pick ${r_1, \ldots, r_t \in \{ 0,1 \}^n}$ where ${k = \log O(1/\epsilon^2)}$
• define ${r_S := \sum_{j\in S} r_j}$ for each non-empty ${S\subseteq \{1,\ldots,t\}}$
• for all ${b_1, \ldots, b_t \in \{ 0,1 \}}$
• define ${b_S := \sum_{j\in S} b_j}$ for each non-empty ${S\subseteq \{1,\ldots,t\}}$
• define ${H'_{b_1 \ldots b_k}(r)}$ as majority of: ${ H(r + r_S) - b_S}$ over non-empty ${S}$
• apply Algorithm GLW to ${H'_{b_1 \ldots b_t}}$
• return list

Let us now see why this algorithm works. First we define, for any nonempty ${S \subseteq \{1, \ldots, t\},}$ ${r_S = \sum_{j \in S} r_j.}$ Then, since ${r_1, \ldots, r_t \in \{ 0,1 \}^n}$ are random, it follows that for any ${S \neq T,}$ ${r_S}$ and ${r_T}$ are independent and uniformly distributed. Now consider an ${x}$ such that ${\langle x, r \rangle}$ and ${H(r)}$ agree on a ${\frac{1}{2} + \epsilon}$ fraction of the values of ${r}$. Then for the choice of ${\{b_j\}}$ where ${b_j = \langle x, r_j \rangle}$ for all ${j,}$ we have that

$\displaystyle b_S = \langle x, r_S \rangle$

for every non-empty ${S}$. In such a case, for every ${S}$ and every ${r}$, there is a probability at least ${\frac{1}{2} +\epsilon,}$ over the choices of the ${r_j}$ that

$\displaystyle H(r+ r_S) - b_S = \langle x, r \rangle\ ,$

and these events are pair-wise independent. Note the following simple lemma.

Lemma 4 Let ${R_1, \ldots, R_k}$ be a set of pairwise independent ${0-1}$ random variables, each of which is ${1}$ with probability at least ${\frac{1}{2} + \epsilon.}$ Then ${\mathop{\mathbb P}[\sum_i R_i \ge k/2] \ge 1 - \frac 1 {4\epsilon^2 k}}$.

Proof: Let ${R=R_1+\cdots+R_t}$. The variance of a 0/1 random variable is at most ${1/4}$, and, because of pairwise independence, ${{\bf Var}[R] = {\bf Var} [ R_1+\ldots+R_k] = \sum_i {\bf Var}[R_k] \leq k/4}$.

We then have

$\displaystyle \mathop{\mathbb P}[ R \leq k/2] \leq \mathop{\mathbb P}[ |R - \mathop{\mathbb E}[R]| \geq \epsilon k] \leq \frac{{\bf Var}[R]}{\epsilon^2 k^2} \leq \frac 1 {4\epsilon^2 k}$

$\Box$

Lemma 4 allows us to upper-bound the probability that the majority operation used to compute ${H'}$ gives the wrong answer. Combining this with our earlier observation that the ${\{r_S\}}$ are pairwise independent, we see that choosing ${t = \log (128/\epsilon^2)}$ suffices to ensure that ${H'_{b_1 \ldots b_t}(r)}$ and ${\langle x, r\rangle}$ have agreement at least ${7/8}$ with probability at least ${3/4}$. Thus we can use Algorithm ${A_{\frac 78}}$ to obtain ${x}$ with high probability. Choosing ${t}$ as above ensures that the list generated is of length at most ${2^t = 128/\epsilon^2}$ and the running time is then ${O( n^2 \epsilon^{-4} \log n)}$ with ${O(n\epsilon^{-4}\log n)}$ oracle accesses, due to the ${O(1 / \epsilon^2)}$ iterations of Algorithm GLW, that makes ${O(n\log n)}$ oracle accesses, and to the fact that one evaluation of ${H'()}$ requires ${O(1/\epsilon^2)}$ evaluations of ${H()}$.