The Large Deviation of Fourwise Independent Random Variables

Suppose ${X_1,\ldots,X_n}$ are mutually independent unbiased ${\pm 1}$ random variables. Then we know everything about the distribution of

$\displaystyle | X_1 + \ldots + X_N | \ \ \ \ \ (1)$

either by using the central limit theorem or by doing calculations by hand using binomial coefficients and Stirling’s approximation. In particular, we know that (1) takes the values ${1,\ldots, 1/\sqrt N}$ with probability ${\Theta(1/\sqrt N)}$ each, and so with constant probability (1) is at most ${O(\sqrt N)}$ .

The last statement can be proved from scratch using only pairwise independence. We compute

$\displaystyle \mathop{\mathbb E} \left| \sum_i X_i \right|^2 = N$

so that

$\displaystyle \mathop{\mathbb P} \left[ \left|\sum_i X_i \right| \geq c \cdot \sqrt N \right] = \mathop{\mathbb P} \left[ \left|\sum_i X_i \right|^2 \geq c^2 \cdot N \right] \leq \frac 1 {c^2}$

It is also true that (1) is at least ${\Omega(\sqrt N)}$ with constant probability, and this is trickier to prove.

First of all, note that a proof based on pairwise independence is not possible any more. If ${(X_1,\ldots,X_N)}$ is a random row of an Hadamard matrix, then ${\sum_i X_i = N}$ with probability ${1/N}$ , and ${\sum_i X_i =0}$ with probability ${1-1/N}$ .

Happily, four-wise independence suffices.

Lemma 1 Suppose ${X_1,\ldots,X_n}$ are four-wise independent ${\pm 1}$ random variables. Then

$\displaystyle \mathop{\mathbb P} \left[ \left| \sum_i X_i \right| \geq \Omega(\sqrt N) \right] \geq \Omega(1)$

This is probably well known to a lot of people, but the only place where I have seen this result is a paper by Andreev et al. that I have mentioned before.

Here is a slightly simplified version of their argument. We compute

$\displaystyle \mathop{\mathbb E} \left| \sum_i X_i \right|^2 = N$

and

$\displaystyle \mathop{\mathbb E} \left| \sum_i X_i \right|^4 = 3N^2 - 2N$

So we have (from here on is our simplified version)

$\displaystyle \mathop{\mathbb E} \left( \left(\sum_i X_i \right)^2 - 2N \right)^2 = 3N^2 - 2N \leq 3N^2$

and

$\displaystyle \mathop{\mathbb P} \left[ \left| \sum_i X_i \right| \leq \frac 13 \sqrt N \right]$

$\displaystyle = \mathop{\mathbb P} \left[ \left| \sum_i X_i \right|^2 \leq \frac 19 N \right]$

$\displaystyle \leq \mathop{\mathbb P} \left[ \left| \left| \sum_i X_i \right|^2 - 2N \right| \geq \frac {17}{9} N \right]$

$\displaystyle = \mathop{\mathbb P} \left[ \left( \left| \sum_i X_i \right|^2 - 2N \right)^2 \geq \frac {289}{81} N^2 \right]$

$\displaystyle \leq \frac 3 {\frac {289}{81}} = .80408$

Thus

$\displaystyle \mathop{\mathbb P} \left[ \left| \sum_i X_i \right| \geq \frac 13 \sqrt N \right] \geq .159$

Following the same proof, but replacing each “ ${X_i}$ ” with “ ${a_i X_i}$ ” and every “ ${N}$ ” with “ ${\sum_i a_i^2}$ ”, we get the more general version

Lemma 2 Suppose ${X_1,\ldots,X_N}$ are four-wise independent ${\pm 1}$ random variables and let ${a_1,\ldots,a_N}$ be any real numbers. Then

$\displaystyle \mathop{\mathbb P} \left[ \left| \sum_i a_i X_i \right| \geq \frac 13 \sqrt {\sum_i a_i^2 } \right] \geq .159$

In particular,

$\displaystyle \mathop{\mathbb E} \left| \sum_i a_i X_i \right| \geq \frac 1 {20} \sqrt {\sum_i a_i^2 }$

A related version that was useful in our work on distinguishers for pseudorandom generators is

Corollary 3 Let ${H}$ be a four-wise independent family of functions ${h: S \rightarrow \{-1,1\}}$ and ${g: S \rightarrow {\mathbb R}}$ be any real valued function. Then

$\displaystyle \mathop{\mathbb E}_{h\sim H} \left| \sum_{x\in S} h(x)g(x) \right| \geq \frac 1{20\cdot \sqrt{|S|}} \sum_x | g(x) |$

To briefly show how this relates to our work on distinguishers, suppose ${G: \{ 0,1 \}^{n-1} \rightarrow \{ 0,1 \}^n}$ is a length-increasing function, and that we want to construct a function ${D}$ , computable by a circuit as small as possible, such that

$\displaystyle \mathop{\mathbb P}_{z\in \{ 0,1 \}^{n-1}} [ D(G(z)) =1 ] - \mathop{\mathbb P}_{x\in \{ 0,1 \}^n} [D(x)=1] \geq \epsilon$

Then this is equivalent to proving

$\displaystyle \mathop{\mathbb E}_{z \in \{ 0,1 \}^{n-1}} (-1)^{D(G(z))} - \mathop{\mathbb E}_{x\in \{ 0,1 \}^n} (-1)^{D(x)} \geq 2 \epsilon \ \ \ \ \ (2)$

and, if we set

$\displaystyle g(x) := \mathop{\mathbb P}_{z\in \{ 0,1 \}^{n-1}} [G(z) = x] - \frac 1 {2^n}$

then (2) is equivalent to

$\displaystyle \sum_{x\in \{ 0,1 \}^n} (-1)^{D(x)} \cdot g(x) \geq 2 \epsilon \ \ \ \ \ (3)$

Now, fix a simple partition ${\cal P}$ of ${\{ 0,1 \}^n}$ into ${1,600 \epsilon^{2} 2^n}$ subsets each of size ${(40 \epsilon)^{-2}}$ , for example based on the value of the first ${\log 1,600 \epsilon^2 2^n}$ bits.

Fix an efficiently computable four-wise independent family ${H}$ of functions ${h: \{ 0,1 \}^n \rightarrow \{ 0,1 \}}$ . Then we have, for every set ${S}$ in the partition

$\displaystyle \mathop{\mathbb E}_{h\sim H} \left| \sum_{x \in S} (-1)^{h(x)} g(x) \right| \geq \frac{1}{20\sqrt {|S|}} \sum_{x\in S} |g(x)| = 2\epsilon \sum_{x\in S} |g(x)|$

and if we sum over all sets in the partition we have

$\displaystyle \mathop{\mathbb E}_{h\sim H} \sum_{S\in {\cal P}} \left| \sum_{x \in S} (-1)^{h(x)} g(x) \right| \geq 2\epsilon \sum_{x\in \{ 0,1 \}^n} |g(x)| \geq 2\epsilon$

In particular, there is a function ${h_0 \in H}$ such that

$\displaystyle \sum_{S\in {\cal P}} \left| \sum_{x \in S} (-1)^{h_0(x)} g(x) \right| \geq 2\epsilon$

To finish the argument, define the function ${c: \{ 0,1 \}^n \rightarrow \{ 0,1 \}}$ such that ${c(z)=1}$ iff ${\sum_{x\in S} (-1)^{h_0(x)} g(x) < 0}$ , where ${S}$ is the set that ${z}$ belongs to. Then

${c(x)}$ depends only on the first ${\log O(\epsilon^{-2} 2^n)}$ bits of the input, and so it can be computed by a circuit of size ${O(\epsilon^{-2}2^n)}$
for every ${S}$ we have
$\displaystyle \left| \sum_{x \in S} (-1)^{h_0(x)} g(x) \right| = \sum_{x\in S} (-1)^{c(x)} \cdot (-1)^{h_0(x)} g(x)$

So, in conclusion

$\displaystyle \sum_{x \in \{ 0,1 \}^n} (-1)^{c(x) + h_0(x)} g(x) \geq 2\epsilon$

meaning that setting ${D(x) := c(x) \oplus h(x)}$ satisfies (3), and gives us a distinguisher obtaining advantage ${\epsilon}$ and having circuit complexity ${O(\epsilon^{-2} 2^n)}$ .

15 thoughts on “The Large Deviation of Fourwise Independent Random Variables”

Hi Luca,

You are using/proving the Payley-Zygmund inequality, also popularly known as the “second moment method.”

Similar results have been obtained by Serge Vaudenay using his “decorrelation theory” [1] in the context of the design of secure block ciphers. More precisely, his paper [2] about iterated distinguishers shows that a decorrelation of order $2d$ is sufficient to resist an attack of order $d$. An open question is whether this is a necessary condition.

[1] S. Vaudenay, “Decorrelation: A Theory for Block Cipher Security”, J. Cryptology, 16(4), 249-286, 2003.

[2] S. Vaudenay, “Resistance against general iterated attacks”, Proceedings of Eurocrypt’99, pp. 255-271, LNCS 1592, Springer, 1999.

Hi Luca,

Lemma 1 was also proven by Bonnie Berger back in SODA ’91. See her paper, “The Fourth Moment Method”.

Yeah, what James said, with Wikipedia’s $Z = (\sum a_i X_i)^2$ . Indeed, if the $X_i$ ‘s in Lemma 2 are any $(2, 4, \eta)$ -hypercontractive random variables, then so too is $\sum a_i X_i$ ; then Paley-Zygmund gives the lower bound (with the RHS depending on $eta$ ).

Thanks, Jelani, and her calculations are much tighter, in Lemma 2 she has a factor of $\sqrt 3$ instead of 20 (Andreev, Clementi and Rolim had 96)

Hi Pascal, what is the similarity that you see? Vaudenay shows that if a family of permutations is close to k-wise independent under certain norms then certain classes of attackers have small advantage in distinguishing a function from the family from a truly random permutation. (And some of those conditions are necessary in order to upper bound the advantage of distinguishers from those classes.)

Here we have a completely arbitrary distribution over $n$ -bit strings that is far from uniform in total variation distance (for concreteness I have talked about the output of a length-increasing generator) and we show that, within a set S, a distinguisher sampled from a 4-wise independent family of functions has typically advantage about $1/\sqrt {|S|}$ times the best possible advantage within the set S.

(So if we are willing to invest in a look-up table of size $2^n/|S|$ to know the sign of the distinguishing probability in each set of size $|S|$ into which we partition $\{0,1\}^n$ , we can get advantage $1/\sqrt{|S|}$ overall. This is not an iterated attack because there are no queries to make, it’s just a case analysis.)

So here it’s the attacker, not the construction that is coming from a $k$ -wise independent distribution, and we are proving a lower bound, not an upper bound, on the distinguishing probability.

I had a question along similar lines on sum of $k$ -wise independent variables. Assume $X_1, \ldots, X_n$ are $k$ -wise independent $\pm 1$ random variables. What can be said about the following expression :

$\displaystyle {\mathbb P} \left[\left|\sum_{i} X_i\right| > \frac{n^{0.5}}{100}\right]$

By Berry – Esseen theorem, if $k=n$ , then it is $1-o(1)$ and the above discussion shows that $k \geq 4$ implies that it is bigger than a constant. But can we get a better bound if $4<k<n$ ?

Hi Anindya,

If the $X_i$ are k-wise independent for $k = O(1/ \epsilon^2)$ , then

$\displaystyle Pr \left[ \left|\sum_i X_i \right| > \sqrt{n} \right] >= 1 - O(\epsilon). \ \ \ (*)$

Note when $1/\epsilon^2 = n$ , this gives a lower bound of $1 - O(1/\sqrt{n} )$ , which is what Berry-Esseen would tell you.

(*) follows from Section A.5 of http://web.mit.edu/minilek/www/papers/lp.pdf (last paragraph, which refers back to an earlier argument in the paper ^^;), though I regret that recovering (*) from what’s written there might take a bit of sorting through our notation, since the paper is really about something else. You can also get (*) from “Bounded Independence Fools Halfspaces” (Diakonikolas et al., FOCS ’09) with only a slightly larger k of $O(\log^2(1/\epsilon)/\epsilon^2)$ .

What do we know anything about how tight (*) is, in Jelani’s previous comment?

I don’t understand: if the $X_i$ are mutually independent, then from the Berry-Esseen theorem we have that for every threshold $t$

$\displaystyle \left| {\mathbb P} \left[ \left|\sum_i X_i \right| \geq t \sqrt n \right] - {\mathbb P} \left[ \left| N \right| \geq t \right] \right| \leq O \left( \frac 1 {\sqrt n} \right)$

where $N$ is a normal random variable with expectation zero and variance 1. But the probability that $|N|$ is more than $1/100$ is not $1-o(1)$ , it is a constant bounded away from 1 (and from zero).

I think Luca is right though I intended to put $n^{0.45}$ instead of $\sqrt{n}$ . In that case, Berry-Esseen does guarantee that the sum of $X_i$ exceeds $n^{0.45}$ in absolute value with $1-o(1)$ probability. However, a calculation like above does not give anything beyond a constant probability.

Jelani: Thanks for answering. I assume what you guys prove is that if $k=1/\epsilon^2$ , then the difference from the Normal distribution is at most $\epsilon$. That sounds great though I have not yet read your paper.

What I should have actually said is, as long as the $X_i$ are $k$-wise independent for $k = \Omega(1/\epsilon^2)$ then

$$ \mathrm{Pr}[|\sum_i X_i| > \epsilon \sqrt{n}] \ge 1 – O(\epsilon) $$

I forgot to type that inner $\epsilon$ above. Sorry for the confusion.

In general, proving concentration bounds under bounded independence has been known for a while. What I believe was not known until recently, is that bounded independence also suffices for anti-concentration bounds. More generally, the following holds:

The Berry-Esseen version of the CLT also applies under bounded independence.

In particular, in Luca’s formulation, if the X_i’ s in the sum are \Omega(1/eps^2)-wise independent, then the distance between the CDF’s is at most 1/\sqrt{n} + eps. (**)

The degree of independence required for such a statement to hold is actually Omega(1/eps^2), i.e. the above bound is optimal up to constant factors.

As Jelani pointed out, (**) follows as a special case of a result by Gopalan, Jaiswal, Servedio, Viola and myself, but the bound there on the degree of independence required is slightly weaker (up to a log^2 factor).

See:

http://www.emis.de/journals/EJP-ECP/_ejpecp/ECP/viewarticle895c.html?id=1796&layout=abstract

In relation to this and possible behaviour when higher independence is assumed.

That had been incredibly imformative. I appreciate you all the material.

in theory

"Marge, I agree with you – in theory. In theory, communism works. In theory." — Homer Simpson

The Large Deviation of Fourwise Independent Random Variables

15 thoughts on “The Large Deviation of Fourwise Independent Random Variables”

Leave a comment

Share this:

Related

15 thoughts on “The Large Deviation of Fourwise Independent Random Variables”

Leave a comment