# CS276 Lecture 16: Pseudorandom Permutations

Scribed by Anupam Prakash

Summary

Today we finish the analysis of a construction of a pseudorandom permutation (block cipher) given a pseudorandom function.

1. The Luby-Rackoff Construction

Recall that if ${F:\{ 0,1 \}^m \rightarrow \{ 0,1 \}^m}$ is a function, then we define the Feistel permutation ${D_F: \{ 0,1 \}^{2m} \rightarrow \{ 0,1 \}^{2m}}$ associated with ${F}$ as

$\displaystyle D_F(x,y) := y, x\oplus F(y) \ \ \ \ \ (1)$

Let ${F: \{ 0,1 \}^k \times \{ 0,1 \}^m \rightarrow \{ 0,1 \}^m}$ be a pseudorandom function, we define the following function ${P: \{ 0,1 \}^{4k} \times \{ 0,1 \}^{2m} \rightarrow \{ 0,1 \}^{2m}}$: given a key ${\overline K (K_1,\ldots,K_4)}$ and an input ${x}$,

$\displaystyle P_{\overline K} (x) := D_{F_{K_4}} ( D_{F_{K_3}} ( D_{F_{K_2}} (D_{F_{K_1}} (x ))))\ \ \ \ \ (2)$

If ${\overline F = F_1,F_2,F_3,F_4}$ are four functions, then ${P_{\overline F}}$ is the same as the above construction but using the functions ${F_i}$:

$\displaystyle P_{\overline F} (x) := D_{F_{4}} ( D_{F_{3}} ( D_{F_{2}} (D_{F_{1}} (x ))))\ \ \ \ \ (3)$

If ${A}$ is an oracle algorithm, we define as ${S(A)}$ the probabilistic process in which we run a simulation of ${A}$ in which we reply to each query with a random answer.

2. Today’s Proof

The proof of the following result is what was missing from yesterday’s analysis.

Lemma 1 For every non-repeating algorithm ${A}$ of complexity ${\leq t}$ we have

$\displaystyle \left| \mathop{\mathbb P}_{\overline {F} } [ A^{P_{\overline{R}}, P^{-1} _{\overline{R}} } () =1 ] - \mathop{\mathbb P} [ S(A) = 1] \right|$

$\displaystyle \leq \frac{t^2}{ 2\cdot 2^{2m}} + \frac {t^2}{ 2^{m}}$

Proof: The transcript of ${A}$‘s computation consists of all the oracle queries made by ${A}$. The notation ${(x,y,0)}$ represents a query to the ${\pi}$ oracle at point ${x}$ while ${(x,y,1)}$ is a query made to the ${\pi^{-1}}$ oracle at ${y}$. The set ${T}$ consists of all valid transcripts for computations where the output of ${A}$ is 1 while ${T^{'} \subset T}$ consists of transcripts in ${T}$ consistent with ${\pi}$ being a permutation.

We write the difference in the probability of ${A}$ outputting 1 when given oracles ${(P_{\overline{R}}, P_{\overline{R}}^{-1})}$ and when given a random oracle as in ${S(A)}$ as a sum over transcripts in ${T}$.

$\displaystyle \begin{array}{c} \left| \mathop{\mathbb P}_{\overline {F} } [ A^{P_{\overline{R}}, P^{-1} _{\overline{R}} } () =1 ] - \mathop{\mathbb P} [ S(A) = 1] \right| \\ = \left| \sum_{\tau \in T} \left(\mathop{\mathbb P}_{\overline {F} } [ A^{P_{\overline{R}}, P^{-1} _{\overline{R}} } () \leftarrow \tau ] - \mathop{\mathbb P} [ S(A) \leftarrow \tau]\right) \right| \end{array} \ \ \ \ \ (4)$

We split the sum over ${T}$ into a sum over ${T^{'}}$ and ${T\setminus T^{'}}$ and bound both the terms individually. We first handle the simpler case of the sum over ${T\setminus T^{'}}$.

$\displaystyle \begin{array}{rcl} && \displaystyle \left| \sum_{\tau \in T\setminus T^{'}} \left(\mathop{\mathbb P}_{\overline {F} } [ A^{P_{\overline{R}}, P^{-1} _{\overline{R}} } () \leftarrow \tau ] - \mathop{\mathbb P} [ S(A) \leftarrow \tau] \right) \right| \\ & =& \displaystyle \left| \sum_{\tau \in T\setminus T^{'}} \left(\mathop{\mathbb P} [ S(A) \leftarrow \tau] \right) \right| \\ & \leq & \displaystyle \frac{t^{2}}{2.2^{2m}} \end{array} \ \ \ \ \ (5)$

The first equality holds as a transcript obtained by running ${A}$ using the oracle ${(P_{\overline{R}}, P_{\overline{R}}^{-1})}$ is always consistent with a permutation. The transcript generated by querying an oracle is inconsistent with a permutation iff. points ${x,y}$ with ${f(x)=f(y)}$ are queried. ${S(A)}$ makes at most ${t}$ queries to an oracle that answers every query with an independently chosen random string from ${\{ 0, 1\}^{2m}}$. The probability of having a repetition is at most ${(\sum_{i=1}^{t-1} i)/2^{2m} \leq t^{2}/2^{2m+1}}$.

Bounding the sum over transcripts in ${T^{'}}$ will require looking into the workings of the construction. Fix a transcript ${\tau \in T^{'}}$ given by ${(x_{i}, y_{i}, b_{i}), 1\leq i \leq q}$, with the number of queries ${q \leq t}$. Each ${x_{i}}$ can be written as ${(L_{i}^{0}, R_{i}^{0})}$ for strings ${L_{i}^{0}, R_{i}^{0}}$ of length ${m}$ corresponding to the left and right parts of ${x_{i}}$. The string ${x_{i}}$ goes through ${4}$ iterations of ${D}$ using the function ${F_{k}, 1\leq k\leq 4}$ for the ${k}$th iteration. The output of the construction after iteration ${k}$, ${0 \leq k \leq 4}$ for input ${x_{i}}$ is denoted by ${(L_{i}^{k}, R_{i}^{k})}$.

Functions ${F_{1},F_{4}}$ are said to be good for the transcript ${\tau}$ if the multisets ${\{R^{1}_{1}, R^{1}_{2}, \cdots, R^{1}_{q}\}}$ and ${\{L^{3}_{1}, L^{3}_{2}, \cdots, L^{3}_{q}\}}$ do not contain any repetitions. We bound the probability of ${F_{1}}$ being bad for ${\tau}$ by analyzing what happens when ${R^{1}_{i}=R^{1}_{j}}$ for some ${i,j}$:

$\displaystyle R^{1}_{i} = L^{0}_{i} \oplus F_{1}( R^{0}_{i})$

$\displaystyle R^{1}_{j} = L^{0}_{j} \oplus F_{1}( R^{0}_{j})$

$\displaystyle 0 = L^{0}_{i} \oplus L^{0}_{j} \oplus F_{1}( R^{0}_{i}) \oplus F_{1}( R^{0}_{j}) \ \ \ \ \ (6)$

The algorithm ${A}$ does not repeat queries so we have ${(L_{i}^{0}, R_{i}^{0}) \neq (L_{j}^{0}, R_{j}^{0})}$. We observe that ${R_{i}^{0} \neq R_{j}^{0}}$ as equality together with equation (6) above would yield ${x_{i}=x_{j}}$. This shows that equation (6) holds only if ${F_{1}( R^{0}_{j})= s \oplus F_{1}( R^{0}_{i}) }$, for a fixed ${s}$ and distinct strings ${R^{0}_{i}}$ and ${R^{0}_{j}}$. This happens with probability ${1/2^{m}}$ as the function ${F_{1}}$ takes values from ${\{ 0, 1\}^{m}}$ independently and uniformly at random. Applying the union bound for all pairs ${i,j}$,

$\displaystyle Pr_{F_{1}}[ \exists i,j \in [q], \; \; R^{1}_{i}=R^{1}_{j} ] \leq \frac{t^{2}}{2^{m+1}} \ \ \ \ \ (7)$

We use a similar argument to bound the probability of ${F_{4}}$ being bad. If ${L^{3}_{i}=L^{3}_{j}}$ for some ${i,j}$ we would have:

$\displaystyle \begin{array}{rcl} L^{3}_{i} &=& R^{4}_{i} \oplus F_{4}( L^{4}_{i}) \\ L^{3}_{j} &=& R^{4}_{j} \oplus F_{4}( L^{4}_{j}) \end{array}$

$\displaystyle 0 = R^{4}_{i} \oplus R^{4}_{j} \oplus F_{4}( L^{4}_{i}) \oplus F_{4}( L^{4}_{j}) \ \ \ \ \ (8)$

The algorithm ${A}$ does not repeat queries so we have ${(L_{i}^{4}, R_{i}^{4}) \neq (L_{j}^{4}, R_{j}^{4})}$. We observe that ${L_{i}^{4} \neq L_{j}^{4}}$ as equality together with equation (8) above would yield ${y_{i}=y_{j}}$. This shows that equation (8) holds only if ${F_{4}( L^{4}_{j})= s^{'} \oplus F_{4}( L^{4}_{i}) }$, for a fixed string ${s^{'}}$ and distinct strings ${L^{4}_{i}}$ and ${L^{4}_{j}}$. This happens with probability ${1/2^{m}}$ as the function ${F_{4}}$ takes values from ${\{ 0, 1\}^{m}}$ independently and uniformly at random. Applying the union bound for all pairs ${i,j}$,

$\displaystyle Pr_{F_{4}}[ \exists i,j\in [q], \; \; L^{3}_{i}=L^{3}_{j} ] \leq \frac{t^{2}}{2^{m+1}} \ \ \ \ \ (9)$

Equations (7) and (9) together imply that

$\displaystyle Pr_{F_{1}, F_{4}}[ F_{1}, F_{4} \text{ not good for transcript } \tau ] \leq \frac{t^{2}}{2^{m}} \ \ \ \ \ (10)$

Continuing the analysis, we fix good functions ${F_{1}}$, ${F_{4}}$ and the transcript ${\tau}$. We will show that the probability of obtaining ${\tau}$ as a transcript in this case is the same as the probability of obtaining ${\tau}$ for a run of ${S(A)}$. Let ${\tau= (x_{i}, y_{i}, b_{i}), 1 \leq i \leq q \leq t}$. We calculate the probability of obtaining ${y_{i}}$ on query ${x_{i}}$ over the choice of ${F_{2}}$ and ${F_{3}}$ .

The values of the input ${x_{i}}$ are in bijection with pairs ${(L_{i}^{1}, R_{i}^{1})}$ while the values of the output ${y_{i}}$ are in bijection with pairs ${(L_{i}^{3}, R_{i}^{3})}$, after fixing ${F_{1}}$ and ${F_{4}}$. We have the relations (from (1)(3)):

$\displaystyle \begin{array}{rcl} L_{i}^{3} = R_{i}^{2} = L_{i}^{1} \oplus F_{2}( R_{i}^{1} ) \\ R_{i}^{3} = L_{i}^{2} \oplus F_{3}( R_{i}^{2}) = R_{i}^{1} \oplus F_{3}( L_{i}^{3} ) \end{array}$

These relations imply that ${(x_{i}, y_{i})}$ can be an input output pair if and only if we have ${F_{2}( R_{i}^{1}), F_{3}( L_{i}^{3})= (L_{i}^{3} \oplus L_{i}^{1}, R_{i}^{3} \oplus R_{i}^{1})}$. Since ${F_{2}}$ and ${F_{3}}$ are random functions with range ${\{0,1\}^{m}}$, the pair ${(x_{i}, y_{i})}$ occurs with probability ${2^{-2m}}$. The values ${R^{1}_{i}}$ and ${L^{3}_{i}, (i \in [q])}$ are distinct because the functions ${F_{1}}$ and ${F_{4}}$ are good. This makes the occurrence of ${(x_{i}, y_{i})}$ independent from the occurrence of ${(x_{j}, y_{j})}$ for ${i \neq j}$. We conclude that the probability of obtaining the transcript ${\tau}$ equals ${2^{-2mq}}$.

The probability of obtaining transcript ${\tau}$ equals ${2^{-2mq}}$ in the simulation ${S(A)}$ as every query is answered by an independent random number from ${\{0,1\}^{2m}}$. Hence,

$\displaystyle \begin{array}{rcl} && \displaystyle \left| \sum_{\tau \in T^{'}} \left(\mathop{\mathbb P}_{\overline {F} } [ A^{P_{\overline{R}}, P^{-1} _{\overline{R}} } () \leftarrow \tau ] - \mathop{\mathbb P} [ S(A) \leftarrow \tau] \right) \right| \\ &\leq & \displaystyle \left| \sum_{\tau \in T^{'}} \mathop{\mathbb P}_{F_{2}, F_{3} } \left[ A^{P_{\overline{R}}, P^{-1} _{\overline{R}} } () \leftarrow \tau | F_{1}, F_{4} \text { not good for } \tau \right] \right| \\ & \leq & \displaystyle \frac{t^{2}}{2^{m}} \left | \sum_{\tau \in T^{'}} \mathop{\mathbb P}_{F_{2}, F_{3} } [ A^{P_{\overline{R}}, P^{-1} _{\overline{R}} } () \leftarrow \tau] \right | \\ & \leq & \displaystyle \frac{t^{2}}{2^{m}} \end{array} \ \ \ \ \ (11)$

The statement of the lemma follows by adding equations (5) and (11) and using the triangle inequality. $\Box$

This concludes the analysis of the Luby-Rackoff scheme for constructing pseudorandom permutations from a family of pseudorandom functions.