The Large Deviation of Fourwise Independent Random Variables

Suppose {X_1,\ldots,X_n} are mutually independent unbiased {\pm 1} random variables. Then we know everything about the distribution of

\displaystyle   | X_1 + \ldots + X_N | \ \ \ \ \ (1)

either by using the central limit theorem or by doing calculations by hand using binomial coefficients and Stirling’s approximation. In particular, we know that (1) takes the values {1,\ldots, 1/\sqrt N} with probability {\Theta(1/\sqrt N)} each, and so with constant probability (1) is at most {O(\sqrt N)}.

The last statement can be proved from scratch using only pairwise independence. We compute

\displaystyle  \mathop{\mathbb E} \left| \sum_i X_i \right|^2 = N

so that

\displaystyle  \mathop{\mathbb P} \left[ \left|\sum_i X_i \right| \geq c \cdot \sqrt N \right] = \mathop{\mathbb P} \left[ \left|\sum_i X_i \right|^2 \geq c^2 \cdot N \right] \leq \frac 1 {c^2}

It is also true that (1) is at least {\Omega(\sqrt N)} with constant probability, and this is trickier to prove.

First of all, note that a proof based on pairwise independence is not possible any more. If {(X_1,\ldots,X_N)} is a random row of an Hadamard matrix, then {\sum_i X_i = N} with probability {1/N}, and {\sum_i X_i =0} with probability {1-1/N}.

Happily, four-wise independence suffices.

Lemma 1 Suppose {X_1,\ldots,X_n} are four-wise independent {\pm 1} random variables. Then

\displaystyle  \mathop{\mathbb P} \left[ \left| \sum_i X_i \right| \geq \Omega(\sqrt N) \right] \geq \Omega(1)

This is probably well known to a lot of people, but the only place where I have seen this result is a paper by Andreev et al. that I have mentioned before.

Here is a slightly simplified version of their argument. We compute

\displaystyle  \mathop{\mathbb E} \left| \sum_i X_i \right|^2 = N


\displaystyle  \mathop{\mathbb E} \left| \sum_i X_i \right|^4 = 3N^2 - 2N

So we have (from here on is our simplified version)

\displaystyle  \mathop{\mathbb E} \left( \left(\sum_i X_i \right)^2 - 2N \right)^2 = 3N^2 - 2N \leq 3N^2


\displaystyle  \mathop{\mathbb P} \left[ \left| \sum_i X_i \right| \leq \frac 13 \sqrt N \right]

\displaystyle  = \mathop{\mathbb P} \left[ \left| \sum_i X_i \right|^2 \leq \frac 19 N \right]

\displaystyle  \leq \mathop{\mathbb P} \left[ \left| \left| \sum_i X_i \right|^2 - 2N \right| \geq \frac {17}{9} N \right]

\displaystyle  = \mathop{\mathbb P} \left[ \left( \left| \sum_i X_i \right|^2 - 2N \right)^2 \geq \frac {289}{81} N^2 \right]

\displaystyle  \leq \frac 3 {\frac {289}{81}} = .80408


\displaystyle  \mathop{\mathbb P} \left[ \left| \sum_i X_i \right| \geq \frac 13 \sqrt N \right] \geq .159

Following the same proof, but replacing each “{X_i}” with “{a_i X_i}” and every “{N}” with “{\sum_i a_i^2}”, we get the more general version

Lemma 2 Suppose {X_1,\ldots,X_N} are four-wise independent {\pm 1} random variables and let {a_1,\ldots,a_N} be any real numbers. Then

\displaystyle  \mathop{\mathbb P} \left[ \left| \sum_i a_i X_i \right| \geq \frac 13 \sqrt {\sum_i a_i^2 } \right] \geq .159

In particular,

\displaystyle  \mathop{\mathbb E} \left| \sum_i a_i X_i \right| \geq \frac 1 {20} \sqrt {\sum_i a_i^2 }

A related version that was useful in our work on distinguishers for pseudorandom generators is

Corollary 3 Let {H} be a four-wise independent family of functions {h: S \rightarrow \{-1,1\}} and {g: S \rightarrow {\mathbb R}} be any real valued function. Then

\displaystyle  \mathop{\mathbb E}_{h\sim H} \left| \sum_{x\in S} h(x)g(x) \right| \geq \frac 1{20\cdot \sqrt{|S|}} \sum_x | g(x) |

To briefly show how this relates to our work on distinguishers, suppose {G: \{ 0,1 \}^{n-1} \rightarrow \{ 0,1 \}^n} is a length-increasing function, and that we want to construct a function {D}, computable by a circuit as small as possible, such that

\displaystyle  \mathop{\mathbb P}_{z\in \{ 0,1 \}^{n-1}} [ D(G(z)) =1 ] - \mathop{\mathbb P}_{x\in \{ 0,1 \}^n} [D(x)=1] \geq \epsilon

Then this is equivalent to proving

\displaystyle   \mathop{\mathbb E}_{z \in \{ 0,1 \}^{n-1}} (-1)^{D(G(z))} - \mathop{\mathbb E}_{x\in \{ 0,1 \}^n} (-1)^{D(x)} \geq 2 \epsilon \ \ \ \ \ (2)

and, if we set

\displaystyle  g(x) := \mathop{\mathbb P}_{z\in \{ 0,1 \}^{n-1}} [G(z) = x] - \frac 1 {2^n}

then (2) is equivalent to

\displaystyle   \sum_{x\in \{ 0,1 \}^n} (-1)^{D(x)} \cdot g(x) \geq 2 \epsilon \ \ \ \ \ (3)

Now, fix a simple partition {\cal P} of {\{ 0,1 \}^n} into {1,600 \epsilon^{2} 2^n} subsets each of size {(40 \epsilon)^{-2}}, for example based on the value of the first {\log 1,600 \epsilon^2 2^n} bits.

Fix an efficiently computable four-wise independent family {H} of functions {h: \{ 0,1 \}^n \rightarrow \{ 0,1 \}}. Then we have, for every set {S} in the partition

\displaystyle  \mathop{\mathbb E}_{h\sim H} \left| \sum_{x \in S} (-1)^{h(x)} g(x) \right| \geq \frac{1}{20\sqrt {|S|}} \sum_{x\in S} |g(x)| = 2\epsilon \sum_{x\in S} |g(x)|

and if we sum over all sets in the partition we have

\displaystyle  \mathop{\mathbb E}_{h\sim H} \sum_{S\in {\cal P}} \left| \sum_{x \in S} (-1)^{h(x)} g(x) \right| \geq 2\epsilon \sum_{x\in \{ 0,1 \}^n} |g(x)| \geq 2\epsilon

In particular, there is a function {h_0 \in H} such that

\displaystyle  \sum_{S\in {\cal P}} \left| \sum_{x \in S} (-1)^{h_0(x)} g(x) \right| \geq 2\epsilon

To finish the argument, define the function {c: \{ 0,1 \}^n \rightarrow \{ 0,1 \}} such that {c(z)=1} iff {\sum_{x\in S} (-1)^{h_0(x)} g(x) < 0}, where {S} is the set that {z} belongs to. Then

  1. {c(x)} depends only on the first {\log O(\epsilon^{-2} 2^n)} bits of the input, and so it can be computed by a circuit of size {O(\epsilon^{-2}2^n)}
  2. for every {S} we have

    \displaystyle  \left| \sum_{x \in S} (-1)^{h_0(x)} g(x) \right| = \sum_{x\in S} (-1)^{c(x)} \cdot (-1)^{h_0(x)} g(x)

So, in conclusion

\displaystyle  \sum_{x \in \{ 0,1 \}^n} (-1)^{c(x) + h_0(x)} g(x) \geq 2\epsilon

meaning that setting {D(x) := c(x) \oplus h(x)} satisfies (3), and gives us a distinguisher obtaining advantage {\epsilon} and having circuit complexity {O(\epsilon^{-2} 2^n)}.

15 thoughts on “The Large Deviation of Fourwise Independent Random Variables

  1. Similar results have been obtained by Serge Vaudenay using his “decorrelation theory” [1] in the context of the design of secure block ciphers. More precisely, his paper [2] about iterated distinguishers shows that a decorrelation of order $2d$ is sufficient to resist an attack of order $d$. An open question is whether this is a necessary condition.

    [1] S. Vaudenay, “Decorrelation: A Theory for Block Cipher Security”, J. Cryptology, 16(4), 249-286, 2003.

    [2] S. Vaudenay, “Resistance against general iterated attacks”, Proceedings of Eurocrypt’99, pp. 255-271, LNCS 1592, Springer, 1999.

  2. Yeah, what James said, with Wikipedia’s Z = (\sum a_i X_i)^2. Indeed, if the X_i‘s in Lemma 2 are any (2, 4, \eta)-hypercontractive random variables, then so too is \sum a_i X_i; then Paley-Zygmund gives the lower bound (with the RHS depending on eta).

  3. Thanks, Jelani, and her calculations are much tighter, in Lemma 2 she has a factor of \sqrt 3 instead of 20 (Andreev, Clementi and Rolim had 96)

  4. Hi Pascal, what is the similarity that you see? Vaudenay shows that if a family of permutations is close to k-wise independent under certain norms then certain classes of attackers have small advantage in distinguishing a function from the family from a truly random permutation. (And some of those conditions are necessary in order to upper bound the advantage of distinguishers from those classes.)

    Here we have a completely arbitrary distribution over n-bit strings that is far from uniform in total variation distance (for concreteness I have talked about the output of a length-increasing generator) and we show that, within a set S, a distinguisher sampled from a 4-wise independent family of functions has typically advantage about 1/\sqrt {|S|} times the best possible advantage within the set S.

    (So if we are willing to invest in a look-up table of size 2^n/|S| to know the sign of the distinguishing probability in each set of size |S| into which we partition \{0,1\}^n, we can get advantage 1/\sqrt{|S|} overall. This is not an iterated attack because there are no queries to make, it’s just a case analysis.)

    So here it’s the attacker, not the construction that is coming from a k-wise independent distribution, and we are proving a lower bound, not an upper bound, on the distinguishing probability.

  5. I had a question along similar lines on sum of k-wise independent variables. Assume X_1, \ldots, X_n are k-wise independent \pm 1 random variables. What can be said about the following expression :

    \displaystyle {\mathbb P} \left[\left|\sum_{i} X_i\right| > \frac{n^{0.5}}{100}\right]

    By Berry – Esseen theorem, if k=n, then it is 1-o(1) and the above discussion shows that k \geq 4 implies that it is bigger than a constant. But can we get a better bound if 4<k<n?

  6. Hi Anindya,

    If the X_i are k-wise independent for k = O(1/ \epsilon^2), then

    \displaystyle Pr \left[ \left|\sum_i X_i \right| > \sqrt{n} \right] >= 1 - O(\epsilon).  \ \ \ (*)

    Note when 1/\epsilon^2 = n, this gives a lower bound of 1 - O(1/\sqrt{n} ), which is what Berry-Esseen would tell you.

    (*) follows from Section A.5 of (last paragraph, which refers back to an earlier argument in the paper ^^;), though I regret that recovering (*) from what’s written there might take a bit of sorting through our notation, since the paper is really about something else. You can also get (*) from “Bounded Independence Fools Halfspaces” (Diakonikolas et al., FOCS ’09) with only a slightly larger k of O(\log^2(1/\epsilon)/\epsilon^2).

  7. I don’t understand: if the X_i are mutually independent, then from the Berry-Esseen theorem we have that for every threshold t

    \displaystyle \left| {\mathbb P} \left[ \left|\sum_i X_i \right| \geq t \sqrt n \right] - {\mathbb P} \left[ \left| N \right| \geq t  \right] \right| \leq O \left( \frac 1 {\sqrt n} \right)

    where N is a normal random variable with expectation zero and variance 1. But the probability that |N| is more than 1/100 is not 1-o(1), it is a constant bounded away from 1 (and from zero).

  8. I think Luca is right though I intended to put n^{0.45} instead of \sqrt{n}. In that case, Berry-Esseen does guarantee that the sum of X_i exceeds n^{0.45} in absolute value with 1-o(1) probability. However, a calculation like above does not give anything beyond a constant probability.

    Jelani: Thanks for answering. I assume what you guys prove is that if k=1/\epsilon^2, then the difference from the Normal distribution is at most $\epsilon$. That sounds great though I have not yet read your paper.

  9. What I should have actually said is, as long as the $X_i$ are $k$-wise independent for $k = \Omega(1/\epsilon^2)$ then

    $$ \mathrm{Pr}[|\sum_i X_i| > \epsilon \sqrt{n}] \ge 1 – O(\epsilon) $$

    I forgot to type that inner $\epsilon$ above. Sorry for the confusion.

  10. In general, proving concentration bounds under bounded independence has been known for a while. What I believe was not known until recently, is that bounded independence also suffices for anti-concentration bounds. More generally, the following holds:

    The Berry-Esseen version of the CLT also applies under bounded independence.

    In particular, in Luca’s formulation, if the X_i’ s in the sum are \Omega(1/eps^2)-wise independent, then the distance between the CDF’s is at most 1/\sqrt{n} + eps. (**)

    The degree of independence required for such a statement to hold is actually Omega(1/eps^2), i.e. the above bound is optimal up to constant factors.

    As Jelani pointed out, (**) follows as a special case of a result by Gopalan, Jaiswal, Servedio, Viola and myself, but the bound there on the degree of independence required is slightly weaker (up to a log^2 factor).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s