Summary
Today we begin a tour of the theory of one-way functions and pseudorandomness.
The highlight of the theory is a proof that if one-way functions exist (with good asymptotic security) then pseudorandom permutations exist (with good asymptotic security). We have seen that pseudorandom permutations suffice to do encryption and authentication with extravagantly high levels of security (respectively, CCA security and existential unforgeability under chosen message attack), and it is easy to see that if one-way functions do not exist, then every encryption and authentication scheme suffers from a total break.
Thus the conclusion is a strong “dichotomy” result, saying that either cryptography is fundamentally impossible, or extravagantly high security is possible.
Unfortunately the proof of this result involves a rather inefficient reduction, so the concrete parameters for which the dichotomy holds are rather unrealistic. (One would probably end up with a system requiring gigabyte-long keys and days of processing time for each encryption, with the guarantee that if it is not CCA secure then every 128-bit key scheme suffers a total break.) Nonetheless it is one of the great unifying achievements of the asymptotic theory, and it remains possible that a more effective proof will be found.
In this lecture and the next few ones we shall prove the weaker statement that if one-way permutations exist then pseudorandom permutations exist. This will be done in a series of four steps each involving reasonable concrete bounds. A number of combinatorial and number-theoretic problems which are believed to be intractable give us highly plausible candidate one-way permutations. Overall, we can show that if any of those well-defined and well-understood problems are hard, then we can get secure encryption and authentication with schemes that are slow but not entirely impractical. If, for example, solving discrete log with a modulus of the order of is hard, then there is a CCA-secure encryption scheme requiring a
-bit key and fast enough to carry email, instant messages and probably voice communication. (Though probably too slow to encrypt disk access or video playback.)
1. One-way Functions and One-way Permutations
A one-way function is a function such that, for a random
, it is hard to find a pre-image of
.
Definition 1 (One-way Function) A function
is
-one way if for every algorithm
of complexity
we have
In the asymptotic theory, one is interested in one-way functions that are defined for all input lengths and are efficiently computable. Recall that a function is called negligible if for every polynomial
we have
.
Definition 2 (One-way Function — Asymptotic Definition) A function
is one-way if
is polynomial time computable and
- for every polynomial
there is a negligible function
such that for all large enough
the function
is
-one way.
Example 1 (Subset Sum) On input
, where
,
parses
as a sequence of
integers, each
-bit long, plus a subset
.
The output is
Some variants of subset-sum have been broken, but it is plausible that
is a
-one way function with
and
super-polynomial in
, maybe even as large as
.
Exercise 1 Let
be a
-secure one-way function. Show that
Definition 3 (One-way Permutation) If
is a bijective
-one way function, then we call
a
–one-way permutation.
If
is an (asymptotic) one-way function, and for every
![]()
is a bijection from
into
, then we say that
is an (asymptotic) one-way permutation.
There is a non-trivial general attack against one-way permutations.
Exercise 2 Let
be a
-secure one-way permutation. Show that
This means that we should generally expect the input length of a secure one-way permutation to be at least 200 bits or so. (Stronger attacks than the generic one are known for the candidates that we shall consider, and their input length is usually 1000 bits or more.)
Example 2 (Modular Exponentiation) Let
be a prime, and
be the group whose elements are
and whose operation is multiplication
. It is a fact (which we shall not prove) that
is cyclic, meaning that there is an element
such that the mapping
is a permutation on
. Such an element
is called a generator, and in fact most elements of
are generators.
is conjectured to be one-way for most choices of
and
.
The problem of inverting
is called the discrete logarithm problem.
The best known algorithm for the discrete logarithm is conjectured to run in time
. It is plausible that for most
and most
the discrete logarithm is a
one way permutation with
and
of the order of
.
Problems like exponentiation do not fit well in the asymptotic definition, because of the extra parameters . (Technically, they do not fit our definitions at all because the input is an element of
instead of a bit string, but this is a fairly trivial issue of data representation.) This leads to the definition of family of one-way functions (and permutations).
2. A Preview of What is Ahead
Our proof that a pseudorandom permutation can be constructed from any one-way permutation will proceed via the following steps:
- We shall prove that for any one-way permutation
we can construct a hard-core predicate
, that is a predicate
such that
is easy to compute given
, but it is hard to compute given
.
- From a one-way function with a hard-core predicate, we shall show how to construct a pseudorandom generator with one-bit expansion, mapping
bits into
.
- From a pseudorandom generator with one-bit expansion, we shall show how to get generators with essentially arbitrary expansion.
- From a length-doubling generator mapping
bits into
, we shall show how to get pseudorandom functions.
- For a pseudorandom function, we shall show how to get pseudorandom permutations.
3. Hard-Core Predicate
Definition 4 (Hard-Core Predicate) A boolean function
is
-hard core for a permutation
if for every algorithm
of complexity
Note that only one-way permutations can have efficiently computable hard-core predicates.
Exercise 3 Suppose that
is a
-hard core predicate for a permutation
, and
is computable in time
. Show that
is
-one way.
It is known that if is one-way, then every bit of
is hard-core.
Our first theorem will be that a random XOR is hard-core for every one-way permutation.
We will use the following notation for “inner product” modulo 2:
Theorem 5 (Goldreich and Levin) Suppose that
is an algorithm of complexity
such that
Then there is an algorithm
of complexity at most
such that
We begin by establishing the following weaker result.
Theorem 6 (Goldreich and Levin — Weak Version) Suppose that
is an algorithm of complexity
such that
Then there is an algorithm
of complexity at most
such that
Before getting into the proof of Theorem 6, it is useful to think of the “super-weak” version of the Goldreich-Levin theorem, in which the right-hand-side in (3) is 1. Then inverting is very easy. Call
the vector that has
in the
-th position and zeroes everywhere else, thus
. Now, given
and an algorithm
for which the right-hand-side of (3) is 1, we have
for every
, and so we can compute
given
via
invocations of
. In order to prove the Goldreich-Levin theorem we will do something similar, but we will have to deal with the fact that we only have an algorithm that approximately computes inner products.
We derive the Weak Goldreich-Levin Theorem from the following reconstruction algorithm.
Lemma 7 (Goldreich-Levin Algorithm — Weak Version) There is an algorithm
that given oracle access to a function
such that, for some
,
runs in time
, makes
queries into
, and with
probability outputs
.
Before proving Lemma 7, we need to state the following version of the Chernoff Bound.
Lemma 8 (Chernoff Bound) Let
be mutually independent
random variables. Then, for every
, we have
Proof: We only give a sketch. Let . Then we want to prove that
For every fixed , Markov’s inequality gives us
We can use independence to write
and some calculus shows that for every we have
So we get
Equation (5) holds for every , and in particular for
giving us
as desired.
We can proceed with the design and the analysis of the algorithm of Lemma 7.
Proof: [Of Lemma 7] The idea of the algorithm is that we would like to compute for
, but we cannot do so by simply evaluating
, because it is entirely possible that
is incorrect on those inputs. If, however, we were just interested in computing
for a random
, then we would be in good shape, because
would be correct with resonably large probability. We thus want to reduce the task of computing
on a specific
, to the task of computing
for a random
. We can do so by observing the following identity: for every
and every
, we have
where all operations are mod 2. (And bit-wise, when involving vectors.) So, in order to compute we can pick a random
, and then compute
. If
is uniformly distributed, then
and
are uniformly distributed, and we have
Suppose now that we pick independently several random vectors , and that we compute
for
and we take the majority value of the
as our estimate for
. By the above analysis, each
equals
with probability at least
; furthermore, the events
are mutually independent. We can then invoke the Chernoff bound to deduce that the probability that the majority value is wrong is at most
. (If the majority vote of the
is wrong, it means that at least
or the
are wrong, even though the expected number of wrong ones is at most
, implying a deviation of
from the expectation; we can invoke the Chernoff bound with
.) The algorithm GLW is thus as follows:
- Algorithm GLW
- for
to
- for
to
- pick a random
- pick a random
-
- for
- return
For every , the probability fails to compute
is at most
. So the probability that the algorithm fails to return
is at most
. The algorithm takes time
and makes
oracle queries into
.
In order to derive Theorem 6 from Lemma 7 we will need the following variant of Markov’s inequality.
Lemma 9 Let
be a discrete bounded non-negative random variable ranging over
. Then for every
,
Proof: Let be the set of values taken by
with non-zero probability. Then
So we have .
We can now prove Theorem 6.
Proof: [Of Theorem 6] The assumption of the Theorem can be rewritten as
From Lemma 9 we have
Call an “good” if it satisfies
.
The inverter , on input
, runs the algorithm GLW using the oracle
. If
is good, then the algorithm finds
with probability at least
. At least half of the choices of
are good, so overall the algorithm inverts
on at least a
fraction of inputs. The running time of the algorithm if
plus the cost of
calls to
, each costing time
.
one way function
this is a
this
this is
this
Lemma 9 looks like a geometric series