Summary

Today we show how to construct a pseudorandom function from a pseudorandom generator.

1. Pseudorandom generators evaluated on independent seeds

We first prove a simple lemma which we will need. This lemma simply says that if ${G}$ is a pseudorandom generator with output length ${m}$, then if we evaluate ${G}$ on ${k}$ independent seeds the resulting function is still a pseudorandom generator with output length ${km}$.

Lemma 1 (Generator Evaluated on Independent Seeds) Let ${G:\{ 0,1 \}^n \rightarrow \{ 0,1 \}^m}$ be a ${(t,\epsilon)}$ pseudorandom generator running in time ${t_g}$. Fix a parameter ${k}$, and define ${G^k : \{ 0,1 \}^{kn} \rightarrow \{ 0,1 \}^{km}}$ as

$\displaystyle G^k(x_1,\ldots,x_k) := G(x_1),G(x_2),\ldots,G(x_k)$

Then ${G^k}$ is a ${(t-O(km + kt_g),k\epsilon)}$ pseudorandom generator.

Proof: We will show that if for some ${(t,\epsilon)}$, ${G^k}$ is not a ${(t,\epsilon)}$ psedorandom generator, then ${G}$ cannot be a ${(t+O(km + kt_g),\epsilon/k)}$ pseudorandom generator.

The proof is by a hybrid argument. If ${G^k}$ is not a ${(t, \epsilon)}$ pseudorandom generator, then there exists an algorithm ${D}$ of complexity at most ${t}$, which distinguishes the output of ${G^k}$ on a random seed, from a truly random string of ${km}$ bits i.e.

$\displaystyle \left\lvert\mathop{\mathbb P}_{x_1 , \ldots, x_k}\left[D(G(x_1), \ldots, G(x_k)) = 1\right] - \mathop{\mathbb P}_{r_1, \ldots, r_k}\left[D(r_1, \ldots, r_k) = 1\right] \right\rvert ~>~ \epsilon$

We can then define the hybrid distributions ${H_0, \ldots, H_k}$, where in ${H_i}$ we relplace the first ${i}$ outputs of the pseudorandom generator ${G}$ by truly random strings.

$\displaystyle H_i = (r_1, \ldots, r_i, G(x_{i+1}), \ldots, G(x_n))$

As before, the above statement which says ${\left\lvert\mathop{\mathbb P}_{z \sim H_0} [D(z) = 1] - \mathop{\mathbb P}_{z \sim H_k}[D(z) = 1]\right\rvert > \epsilon}$ would imply that there exists an ${i}$ between 0 and ${k-1}$ such that

$\displaystyle \left\lvert\mathop{\mathbb P}_{z \sim H_i} [D(z) = 1] - \mathop{\mathbb P}_{z \sim H_{i+1}}[D(z) = 1]\right\rvert > \epsilon/k$

We can now define an algorithm ${D'}$ which violates the pseudorandom property of the generator ${G}$. Given an input ${y \in \{ 0,1 \}^m}$, ${D'}$ generates random strings ${r_ 1, \ldots, r_i \in \{ 0,1 \}^m}$, ${x_{i+2}, \ldots, x_{k} \in \{ 0,1 \}^n}$, and outputs ${D(r_1, \ldots, r_i, y, G(x_{i+2}), \ldots, G(x_{k}))}$. It then follows that

$\displaystyle \mathop{\mathbb P}_{x \sim \{ 0,1 \}^n} [D'(G(x)) = 1] ~= \mathop{\mathbb P}_{z \sim H_i}[D(z) = 1] ~~\text{and}~ \mathop{\mathbb P}_{r \sim \{ 0,1 \}^m} [D'(r) = 1] ~= \mathop{\mathbb P}_{z \sim H_{i+1}}[D(z) = 1]$

Hence, ${D'}$ distinguishes the output of ${G}$ on a random seed ${x}$ from a truly random string ${r}$, with probability at least ${\epsilon/k}$. Also, the complexity of ${D'}$ is at most ${t + O(km) + O(kt_g)}$, where the ${O(km)}$ term corresponds to generating the random strings and the ${O(kt_g)}$ terms corresponds to evaluating ${G}$ on at most ${k}$ random seeds. $\Box$

2. Construction of Pseudorandom Functions

We now describe the construction of a pseudorandom function from a pseudorandom generator. Let ${G:\{ 0,1 \}^n \rightarrow \{ 0,1 \}^{2n}}$ be a length-doubling pseudorandom generator. Define ${G_0 : \{ 0,1 \}^n \rightarrow \{ 0,1 \}^n}$ such that ${G_0(x)}$ equals the first ${n}$ bits of ${G(x)}$, and define ${G_1:\{ 0,1 \}^n \rightarrow \{ 0,1 \}^n}$ such that ${G_1(x)}$ equals the last ${n}$ bits of ${G(x)}$.

The the GGM pseudorandom function based on ${G}$ is defined as follows: for key ${K\in \{ 0,1 \}^n}$ and input ${x\in \{ 0,1 \}^n}$:

$\displaystyle F_K (x) := G_{x_n} (G_{x_{n-1}} ( \cdots G_{x_2} ( G_{x_1} (K) ) \cdots )) \ \ \ \ \ (1)$

The evaluation of the function ${F}$ can be visualized by considering a binary tree of depth ${n}$, with a copy of the generator ${G}$ at each node. The root receives the input ${K}$ and passes the outputs ${G_0(K)}$ and ${G_1(K)}$ to its two children. Each node of the tree, receiving an input ${z}$, produces the outputs ${G_0(z)}$ and ${G_1(z)}$ which are passed to its children if its not a leaf. The input ${x}$ to the function ${F_K}$, then selects a path in this tree from the root to a leaf, and produces the output given by the leaf.

We will prove that if ${G:\{ 0,1 \}^n \rightarrow \{ 0,1 \}^{2n}}$ is a ${(t,\epsilon)}$ pseudorandom generator running in time ${t_g}$, then ${F}$ is a ${(t/O(n\cdot t_g), \epsilon \cdot nt)}$ secure pseudorandom function.

2.1. Considering a tree of small depth

We will first consider a slightly simpler situation which illustrates the main idea. We prove that if ${G}$ is ${(t,\epsilon)}$ pseudorandom and runs in time ${t_g}$, then the concatenated output of all the leaves in a tree with ${l}$ levels, is ${(t-O(2^l \cdot t_g), l2^l \cdot\epsilon)}$ pseudorandom. The result is only meaninful when ${l}$ is much smaller than ${n}$.

Theorem 2 Suppose ${G: \{ 0,1 \}^n \rightarrow \{ 0,1 \}^{2n}}$ is a ${(t,\epsilon)}$ pseudorandom generator and ${G}$ is computable in time ${t_g}$. Fix a constant ${l}$ and define ${F_K:\{ 0,1 \}^l \rightarrow \{ 0,1 \}^n}$ as ${F_K (y) := G_{y_l} (G_{y_{l-1}} ( \cdots G_{y_2} ( G_{y_1} (K) ) \cdots ))}$ Then ${\overline{G}:\{ 0,1 \}^n \rightarrow \{ 0,1 \}^{2^l\cdot n}}$ defined as

$\displaystyle \overline{G}(K) := (F_K(0^l), F_K(0^{l-1}1), \ldots, F_K(1^l))$

is a ${(t - O(2^{l}\cdot t_g), l \cdot 2^l \cdot \epsilon)}$ pseudorandom generator.

Proof: The proof is again by a hybrid argument. The hybrids we consider are easier to describe in terms of the tree with nodes as copies of ${G}$. We take ${H_i}$ to be the distribution of outputs at the leaves, when the input to the nodes at depth ${i}$ is replaced by truly random bits, ignoring the nodes at depth ${i-1}$. Hence, ${H_0}$ is simply distributed as ${\overline{G}(K)}$ for a random ${K}$ i.e. only the input to the root is random. Also, in ${H_l}$ we replace the outputs at depth ${l-1}$ by truly random strings. Hence, ${H_l}$ is simply distributed as a random string of length ${n \cdot 2^l}$. The figure below shows the hybrids for the case ${l=2}$, with red color indicating true randomness.

We will prove that ${\overline{G}}$ is not a ${(t, \epsilon)}$ secure pseudorandom generator, then ${G}$ is not ${(t + O(2^{l}\cdot t_g), \epsilon/(l \cdot 2^l))}$ secure. If we assume that there is an algorithm ${D}$ of complexity ${t}$ such that

$\displaystyle \left\lvert\mathop{\mathbb P}_{x \sim \{ 0,1 \}^n}[D(\overline{G}(x)) = 1] - \mathop{\mathbb P}_{r \sim \{ 0,1 \}^{2^l\cdot n}}[D(r) = 1]\right\rvert ~>~ \epsilon$

then we get that there is an ${i}$ such that ${\left\lvert\mathop{\mathbb P}_{z \sim H_i}[D(z) = 1] - \mathop{\mathbb P}_{z \sim H_{i+1}}[D(z) = 1]\right\rvert ~>~ \epsilon/l}$.

We now consider again the difference between ${H_i}$ and ${H_{i+1}}$. In ${H_i}$ the ${2^i \cdot n}$ bits which are the inputs to the nodes at depth ${i}$ are replaced by random bits. These are then used to generate ${2^{i+1}\cdot n}$ bits which serve as inputs to nodes at depth ${i+1}$. In ${H_{i+1}}$, the inputs to nodes at depth ${i+1}$ are random.

Let ${\overline{G}_{i+1}:\{ 0,1 \}^{2^{i+1}\cdot n} \rightarrow \{ 0,1 \}^{2^l \cdot n}}$ denote the function which given ${2^{i+1} \cdot n}$ bits, treats them as inputs to the nodes at depth ${i+1}$ and evaluates the output at the leaves in the tree for ${\overline{G}}$. If ${r_1, \ldots, r_{2^i} \sim \{ 0,1 \}^{2n}}$, then ${\overline{G}_{i+1}(r_1, \ldots, r_{2^i})}$ is distributed as ${H_{i+1}}$. Also, if ${x_1, \ldots, x_{2^i} \sim \{ 0,1 \}^{n}}$, then ${\overline{G}_{i+1}(G(x_1), \ldots, G(x_{2^i}))}$ is distributed as ${H_{i}}$.

Hence, ${D}$ can be used to create a distinguisher ${D'}$ which distinguishes ${G}$ evaluated on ${2^i}$ independent seeds, from ${2^i}$ random strings of length ${2n}$. In particular, for ${z \in \{ 0,1 \}^{2^{i+1}\cdot n}}$, we take ${D'(z) = D(G_{i+1}(z))}$. This gives

$\displaystyle \left\lvert \mathop{\mathbb P}_{x_1, \ldots, x_{2^i}}[D'(G(x_1), \ldots, G(x_{2^i})) = 1] - \mathop{\mathbb P}_{r_1, \ldots, r_{2^i}}[D'(r_1, \ldots, r_{2^i}) = 1] \right\rvert\ > \epsilon/l$

Hence,${D'}$ distinguishes ${G^{2^i}(x_1, \ldots, x_{2^i}) = (G(x_1), \ldots, G(x_{2^i}))}$ from a random string. Also, ${G'}$ has complexity ${t + O(2^l \cdot t_g)}$. However, by Lemma 1, if ${G^{2^i}}$ is not ${(t + O(2^l \cdot t_g),\epsilon/l)}$ secure then ${G}$ is not ${(t + O(2^l \cdot t_g + 2^i \cdot n), \epsilon/(l\cdot 2^i))}$ secure. Since ${i \leq l}$, this completes the proof. $\Box$

2.2. Proving the security of the GGM construction

Recall that the GGM function is defined as

$\displaystyle F_K (x) := G_{x_n} (G_{x_{n-1}} ( \cdots G_{x_2} ( G_{x_1} (K) ) \cdots )) \ \ \ \ \ (2)$

We will prove that

Theorem 3 If ${G: \{ 0,1 \}^n \rightarrow \{ 0,1 \}^{2n}}$ is a ${(t,\epsilon)}$ pseudorandom generator and ${G}$ is computable in time ${t_g}$, then ${F}$ is a ${(t/O(nt_g), \epsilon \cdot n \cdot t)}$ secure pseudorandom function.

Proof: As before, we assume that ${F}$ is not a ${(t,\epsilon)}$ secure pseudorandom function, and will show that this implies ${G}$ is not a ${(t\cdot O(nt_g), \epsilon/(n\cdot t))}$ pseudorandom generator. The assumption that ${F}$ is not ${(t,\epsilon)}$ secure, gives that there is an algorithm ${A}$ of complexity at most ${t}$ which distinguishes ${F_K}$ on a random seed ${K}$ from a random function ${R}$, i.e.

$\displaystyle \left\lvert \mathop{\mathbb P}_{K}\left[A^{F_K(\cdot)} = 1\right] - \mathop{\mathbb P}_R\left[A^{R(\cdot)} = 1\right]\right\rvert > \epsilon$

We consider hybrids ${H_0, \ldots, H_n}$ as in the proof of Theorem 2. ${H_0}$ is the distribution of ${F_K}$ for ${K \sim \{ 0,1 \}^n}$ and ${H_n}$ is the uniform distribution over all functions from ${\{ 0,1 \}^n}$ to ${\{ 0,1 \}^n}$. As before, there exists ${i}$ such that

$\displaystyle \left\lvert \mathop{\mathbb P}_{h \sim H_i}\left[A^{h(\cdot)} = 1\right] - \mathop{\mathbb P}_{h \sim H_{i+1}}\left[A^{h(\cdot)} = 1\right]\right\rvert > \epsilon/n$

However, now we can no longer use ${A}$ to construct a distinguisher for ${G^{2^i}}$ as in Theorem 2 since ${i}$ may now be as large as ${n}$. The important observation is that since ${A}$ has complexity ${t}$, it can make at most ${t}$ queries to the function it is given as an oracle. Since the (at most) ${t}$ queries made by ${A}$ will be paths in the tree from the root to the leaves, they can contain at most ${t}$ nodes at depth ${i+1}$. Hence, to simulate the behavior of ${A}$, we only need to generate the value of a function distributed according to ${H_i}$ or ${H_{i+1}}$ at ${t}$ inputs.

We will use this to contruct an algorithm ${D}$ which distinguishes the output of ${G^{t}}$ on ${t}$ independent seeds from ${t}$ random strings of length ${2n}$. ${D}$ takes as input a string of length ${2tn}$, which we treat as ${t}$ pairs ${(z_{1,0},z_{1,1}), \ldots, (z_{t,0}, z_{t,1})}$ with each ${z_{i,j}}$ being of length ${n}$. When queried on an input ${x \in \{ 0,1 \}^n}$, ${D}$ will pick a pair ${(z_{k,0},z_{k,1})}$ according to the first ${i}$ bits of ${x}$ (i.e. choose the randomness for the node at depth ${i}$ which lies on the path), and then choose ${z_{k,x_{i+1}}}$. In particular, ${D((z_{1,0},z_{1,1}), \ldots, (z_{t,0}, z_{t,1}))}$ works as below:

1. Start with counter ${k = 0}$.
2. Simulate ${A}$. When given a query ${x}$

• Check if a pair ${P(x_1, \ldots, x_i)}$ has already been chosen from the first ${k}$ pairs.
• If not, set ${P(x_1, \ldots, x_{i+1}) = k+1}$ and set ${k=k+1}$.
• Answer the query made by ${A}$ as ${G_{x_n}( \cdots G_{i+2}(z_{P(x_1, \ldots, x_{i+1}),x_{i+1}}) \cdots )}$.
3. Return the final output given by ${A}$.

Then, if all pairs are random strings ${r_1, \ldots, r_t}$ of length ${2n}$, the answers received by ${A}$ are as given by a oracle function distributed according to ${H_{i+1}}$. Hence,

$\displaystyle \mathop{\mathbb P}_{r_1,\ldots,r_t}[D(r_1, \ldots, r_t) = 1] = \mathop{\mathbb P}_{h \sim H_{i+1}}\left[A^{h(\cdot)} = 1\right]$

Similarly, if the ${t}$ pairs are outputs of the pseudorandom generator ${G}$ on independent seeds ${x_{1}, \ldots, x_{t} \in \{ 0,1 \}^n}$, then the view of ${A}$ is the same as in the case with an oracle function distributed according to ${H_i}$. This gives

$\displaystyle \mathop{\mathbb P}_{x_1,\ldots,x_t}[D(G(x_1), \ldots, G(x_t)) = 1] = \mathop{\mathbb P}_{h \sim H_i}\left[A^{h(\cdot)} = 1\right]$

Hence, ${D}$ distinguishes the output of ${G^t}$ from a random string with probability ${\epsilon/n}$. Also, it runs in time ${O(t \cdot n \cdot t_g)}$. Then Lemma 1 gives that ${G}$ is not ${(O(t \cdot n\cdot t_g), \epsilon/(n \cdot t))}$ secure. $\Box$