Scribed by Jonah Sherman
Summary
Today we prove the Goldreich-Levin theorem.
1. Goldreich-Levin Theorem
We use the notation
Theorem 1 (Goldreich and Levin) Let
be a permutation computable in time
. Suppose that
is an algorithm of complexity
such that
Then there is an algorithm
of complexity at most
such that
Last time we proved the following partial result.
Lemma 2 (Goldreich-Levin Algorithm — Weak Version) Suppose we have access to a function
such that, for some unknown
, we have
where
is an unknown string.
Then there is an algorithm GLW that runs in time
and makes
oracle queries into
and, with probability at least
, outputs
.
This gave us a proof of a variant of the Goldreich-Levin Theorem in which the right-hand-side in (2) was . We could tweak the proof Lemma 2 so that the right-hand-side of (4) is
, leading to proving a variant of the Goldreich-Levin Theorem in which the right-hand-side in (2) is also
.
We need, however, the full Goldreich-Levin Theorem in order to construct a pseudorandom generator, and so it seems that we have to prove a strengthening of Lemma 2 in which the right-hand-side in (4) is .
Unfortunately such a stronger version of Lemma 2 is just false: for any two different we can construct an
such that
and
so no algorithm can be guaranteed to find given an arbitrary function
such that
, because
need not be uniquely defined by
.
We can, however, prove the following:
Lemma 3 (Goldreich-Levin Algorithm) Suppose we have access to a function
such that, for some unknown
, we have
where
is an unknown string, and
is given.
Then there is an algorithm
that runs in time
, makes
oracle queries into
, and outputs a set
such that
and with probability at least
,
.
The Goldreich-Levin algorithm has other interpretations (an algorithm that learns the Fourier coefficients of
, an algorithm that decodes the Hadamard code is sub-linear time) and various applications outside cryptography.
The Goldreich-Levin Theorem is an easy consequence of Lemma 3. Let take input
and then run the algorithm of Lemma 3 with
, yielding a list
.
then checks if
for any
, and outputs it if one is found.
From the assumption that
it follows by Markov’s inequality (See Lemma 9 in the last lecture) that
Let us call an such that
a good
. If we pick
at random and give
to the above algorithm, there is a probability at least
that
is good and, if so, there is a probability at least
that
is in the list. Therefore, there is a probability at least
that the algorithm inverts
, where the probability is over the choices of
and over the internal randomness of the algorithm.
2. The Goldreich-Levin Algorithm
In this section we prove Lemma 3.
We are given an oracle such that
for an
fraction of the
. Our goal will be to use
to simulate an oracle that has agreement
with
, so that we can use the algorithm of Lemma 2 the previous section to find
. We perform this “reduction” by “guessing” the value of
at a few points.
We first choose random points
where
For the moment, let us suppose that we have “magically” obtained the values
. Then define
as the majority value of:
For each , the above expression equals
with probability at least
(over the choices of
) and by choosing
we can ensure that
from which it follows that
Consider the following algorithm.
- Algorithm GL-First-Attempt
- pick
where
- for all
- define
as majority of:
- apply Algorithm GLW to
- add result to list
- define
- return list
- pick
The idea behind this program is that we do not in fact know the values , but we can “guess” them by considering all choices for the bits
If
agrees with
for at least a
fraction of the
s, then there is a probability at least
that in one of the iteration we invoke algorithm GLW with a simulated oracle that has agreement
with
. Therefore, the final list contains
with probability at least
.
The obvious problem with this algorithm is that its running time is exponential in and the resulting list may also be exponentially larger than the
bound promised by the Lemma.
To overcome these problems, consider the following similar algorithm.
- Algorithm GL
- pick
where
- define
for each non-empty
- for all
- define
for each non-empty
- define
as majority of:
over non-empty
- apply Algorithm GLW to
- add result to list
- define
- return list
- pick
Let us now see why this algorithm works. First we define, for any nonempty
Then, since
are random, it follows that for any
and
are independent and uniformly distributed. Now consider an
such that
and
agree on a
fraction of the values of
. Then for the choice of
where
for all
we have that
for every non-empty . In such a case, for every
and every
, there is a probability at least
over the choices of the
that
and these events are pair-wise independent. Note the following simple lemma.
Lemma 4 Let
be a set of pairwise independent
random variables, each of which is
with probability at least
Then
.
Proof: Let . The variance of a 0/1 random variable is at most
, and, because of pairwise independence,
.
We then have
Lemma 4 allows us to upper-bound the probability that the majority operation used to compute gives the wrong answer. Combining this with our earlier observation that the
are pairwise independent, we see that choosing
suffices to ensure that
and
have agreement at least
with probability at least
. Thus we can use Algorithm
to obtain
with high probability. Choosing
as above ensures that the list generated is of length at most
and the running time is then
with
oracle accesses, due to the
iterations of Algorithm GLW, that makes
oracle accesses, and to the fact that one evaluation of
requires
evaluations of
.
Pingback: Best query complexity of Goldreich-Levin / Kushilevitz-Mansour learning algorithm | MoVn - Linux Ubuntu Center
I did not read this text carefully
(since I’m currently preoccupied with something else),
but it seems to me that valuable info may be provided in
Section 2.5.2.4 of Vol 1 of my FoC book (*);
see Prop 2.5.4, and Footnotes 10 and 11…
Oded
*) http://www.wisdom.weizmann.ac.il/~oded/foc-vol1.html