Small-bias Distributions and DNFs

Two of my favorite challenges in unconditional derandomization are to find logarithmic-seed pseudorandom generators which are good against:

  1. log-space randomized algorithms
  2. {AC^0}, that is, constant depth circuits of polynomial size

Regarding the first challenge, the best known pseudorandom generator remains Nisan’s, from 1990, which requires a seed of {O(\log^2 n)} bits. Maddeningly, even if we look at width-3 oblivious branching programs, that is, non-uniform algorithms that use only {\log_2 3} bits of memory, nobody knows how to beat Nisan’s generator.

Regarding the second challenge, Nisan showed in 1988 that for every {d} there is a pseudorandom generator of seed length {O(\log^{2d+6} n)} against depth-{d} circuits of size {n}. The simplest case is that of depth-2 circuits, or, without loss of generality, of disjunctive-normal-form (DNF) formulas. When specialized to DNF formulas, Nisan’s generator has seed length {O(\log^{10} n)}, but better constructions are known in this case.

Luby, Velickovic and Wigderson improved the seed length to {O(\log^4 n)} in 1993. Bazzi’s celebrated proof of the depth-2 case of the Linian-Nisan conjecture implies that a {O(\log^2 m/\delta)}-wise independent distribution “{\delta}-fools” every {m}-term DNF, by which we mean that for every such DNF formula {\phi} and every such distribution {X} we have

\displaystyle  \left| \mathop{\mathbb P}_{x\sim X} [ \phi(x) = 1] - \mathop{\mathbb P}_{x\sim U} [\phi(x) = 1] \right| \leq \delta

where {U} is the uniform distribution over assignments. This leads to a pseudorandom generator that {\delta}-fools {n}-variable, {m}-term DNF formulas and whose seed length is {O(\log n \cdot \log^2 m/\delta)}, which is {O(\log^3 n)} when {m,n,\delta^{-1}} are polynomially related.

In a new paper with Anindya De, Omid Etesami, and Madhur Tulsiani, we show that an {n}-variable, {m}-term DNF can be {\delta}-fooled by a generator of seed length {O(\log n + \log^2 m/\delta \cdot \log\log m/\delta)}, which is {O(\log^{2+o(1)} n)} when {n,m,\delta^{-1}} are polynomially related.

Our approach is similar to the one in Razborov’s proof of Bazzi’s result, but we use small-bias distribution instead of {k}-wise independent distributions

As discussed in a previous post, Razborov significantly simplified Bazzi’s proof, retaining the same quantitative bounds.

Bazzi had proved that, in order to show that a certain function {f} (e.g. the function computed by a DNF) is {\delta}-fooled by every {k}-wise independent distribution, for a certain {k}, it is sufficient (and, in fact, necessary) to show that there exist degree-{k} real polynomials {f_{\rm lower}} and {f_{\rm upper}} such that

\displaystyle  \forall x\in \{ 0,1 \}^n. \ \ f_{\rm lower} (x) \leq f(x) \leq f_{\rm upper} (x)


\displaystyle  \mathop{\mathbb E}_{x \sim U} [ f_{\rm upper} (x) - f_{\rm lower} (x) ] \leq \delta

Bazzi also shows that constructing such functions reduces to finding one function {f_{\rm approx}} which approximates {f} in the {\ell_2} norm and has “one-sided error,” meaning that if {f(x)=0} then we must have {f_{\rm approx} (x) = 0}. Razborov shows that this good low-degree “one-sided {\ell_2} approximation” for a given DNF can be easily constructed by modifying a result of Linial, Mansour and Nisan, who show that every DNF has good low-degree approximation in the {\ell_2} norm.

In our proof, instead of working with {k}-wise independent distributions we work with small-bias distributions, as defined by Naor and Naor. Recall that a distribution {X} ranging over {\{ 0,1 \}^n} is {\epsilon}-biased if for every subset {S\subseteq \{1,\ldots, n\}} we have

\displaystyle  \left| \mathop{\mathbb E}_{x \sim U} (-1)^{\sum_{i \in S} x_i} - \mathop{\mathbb E}_{x \sim X} (-1)^{\sum_{i \in S} x_i} \right| \leq \delta

It is known how to construct pseudorandom generators of seed length {O(\log n/\epsilon)} whose output is {\epsilon}-biased. It is also known that every {\epsilon}-biased distribution is {O(n^k \cdot \epsilon)}-close in statistical distance to a {k}-wise independent distribution. From this, and from Bazzi’s result, it follows that if a distribution is {\epsilon}-biased then it {(\delta + O(n^{(\log^2 m/\delta)} \cdot \epsilon))}-fool every {n}-variable, {m}-term DNF. This means that every {\epsilon}-biased distribution {\delta}-fools every{m}-term DNF, provided

\displaystyle  \epsilon = exp(-O(\log n \cdot \log^2 m/\delta))

and this gives an alternative way to construct a pseudorandom generator of seed length {O(\log n \cdot \log^2 m/\delta)}.

We show that, however, it is possible to do better if one works with small-bias distributions from the beginning.

Analogously to Bazzi’s characterization of functions fooled by {k}-wise independent distributions, we have that, in order to show that every {\epsilon}-biased distribution fools a given function {f}, it is sufficient (and necessary) to find two functions {f_{\rm lower}} and {f_{\rm upper}} such that

\displaystyle  \forall x\in \{ 0,1 \}^n. \ \ f_{\rm lower} (x) \leq f(x) \leq f_{\rm upper} (x) \ ,

\displaystyle  \mathop{\mathbb E}_{x \sim U} [ f_{\rm upper} (x) - f_{\rm lower} (x) ] \leq \delta \ ,

and such that {f_{\rm lower}} and {f_{\rm upper}} have small Fourier {\ell_1} norm. Recall that, in Bazzi’s characterization, the requirement on the approximating functions was to have small degree, that is to have their Fourier mass concentrated on coefficients defined over small sets. (Note that, for functions of bounded {\ell_2} norm, having degree at most {k} implies having Fourier {\ell_1} norm {O(n^k)}, but a function can have small Fourier norm without having any low-degree approximation. This is analogous to the fact that an {\epsilon}-biased distribution is close to being {k}-wise independent if {\epsilon << 1/n^k}, but a distribution can be {(n-1)}-wise independent without being a small-bias distribution.)

Following Bazzi-Razborov, for the special case of DNF this condition can be reduced to the problem of constructing a single function of small Fourier {\ell_1} norm which provides a good one-sided approximation to a given DNF, which in turn can be reduced to the problem of just finding a function of small Fourier {\ell_1} norm providing a good approximation to a given DNF. The latter problem is solved by Mansour with rather good parameters, allowing us to show that every {\epsilon}-biased distribution {\delta}-fools every {m}-term {n}-variable DNF, provided

\displaystyle  \epsilon = exp(-O(\log n + (\log^2 m/\delta) \cdot \log\log m/\delta ))

which gives seed length {O(\log n + (\log^2 m/\delta) \cdot \log\log m/\delta )}.

Ultimately, one would like to achieve the optimal seed length {O(\log nm/\delta)}. Unfortunately, there are limits to what can be achieved by reasoning about small-bias distributions.

Mansour showed that there exists an {m}-term DNF {\phi} and a {k}-wise independent distribution {X}, with {k=\Omega(\log m \cdot \log 1/\delta)} such that {X} does not {\delta}-fool {\phi}. For the case in which {\delta^{-1}} and {m} are polynomially related, Mansour’s example shows that Bazzi’s proof is optimal: recall that Bazzi proves that every {k}-wise independent distribution {\delta}-fools every {m}-term DNF provided {k= O(\log^2 m/\delta)}.

In Mansour’s example, {X} is the uniform distribution over assignments with an odd number of ones, which is definitely not a small-bias distribution. Unfortunately, a similar example can be constructed where {X} is the uniform distribution over assignments to {\Omega(\log m \cdot \log 1/\delta)} variables which have a number of ones which is not a multiple of 3. Such a distribution is {\epsilon}-biased with {\epsilon = 1/m^{\Omega (\log 1/\delta)}}, showing that our result too is nearly tight when {m} and {\delta^{-1}} are polynomially related.

It remains a very interesting open question to show whether every {1/m^{O(\log 1/\delta)}}-biased distribution {\delta}-fools every {m}-term DNF. This would give an {O(\log nm)}-bit seed length generator that would foold DNFs and CNFs within a constant error. In particular, it would give a polynomial time deterministic algorithm that, given in input a CNF formula satisfied by at least a {.01} fraction of assignments would find a satisfying assignment.

As partial evidence of a positive answer to the above question, we show that every {1/m^{O(\log 1/\delta)}}-biased distribution does {\delta}-fool every read-once {m}-term DNF.

Although “derandomizing” read-once DNFs is a trivial problem, it would be rather interesting to find pseudorandom generators of optimal seed length {O(\log mn/\delta)} that {\delta}-fool every read-once DNF. Unfortunately, we show that such generators cannot be based exclusively on small-bias distributions: we show that there is an {\epsilon}-biased distribution {X} with {\epsilon = 1/m^{\tilde \Omega(\log 1/\delta)}} and an {m}-term read-once DNF {\phi} such that {\phi} is not {\delta}-fooled by {X}. This is by far the most technical result in the paper, and maybe the type of counterexample that we construct could be useful elsewhere.

7 thoughts on “Small-bias Distributions and DNFs

  1. Thanks!

    Do you believe using small-bias distributions improves the parameters of Braverman’s PRGs also?

  2. It should be possible to shave one or more log factors from the final seed length in the generator implied by Braverman’s analysis: if I am not mistaken, reasoning about L1 one gets slightly better bounds for Razborov-Smolemski too, not just for Linial-Nisan-Mansour. But the asymptotic would remain a seed of length \log^{O(d^2)} n to fool depth-d circuits, which is worse than Nisan’s generator, so we did not pursue this direction. It might be interesting to see if, for depth 3, one can do better than \log^{12} n.

  3. Thanks, I see. Do you have some intuition about why fooling linear polynomials is good for fooling DNF’s? One now naturally wonders about functions that fool higher degree polynomials…

  4. If you want to approximate DNF by low degree polynomials, then a natural idea will be to show that DNFs can be approximated nicely by a low l1 norm combination of low degree polynomials (say degree 2 polynomials). However, I do not know of any analogue of fourier coefficients for degree 2 or higher polynomials.

  5. Anindya, concretely, I meant using the sum of small-bias generators, as analysed by Viola in his “The sum of d small-bias generators fools polynomials of degree d”. But perhaps, I should read your paper first before making potentially unintelligent comments.

  6. In fact, I also had the Bogdanov-Lovett-Viola (BLV) generator in mind but as I said, to use it, one has to prove that DNFs can be approximated well by a low L1 norm linear combination of degree d polynomials in the L2 sense. However, I do not know of any results in this direction. Also, to get logarithmic seed length, d < log log n because the seed length of the BLV generator goes as d.2^d.log(1/\epsilon) – (well, the exponential dependence on d may not be true/optimal but as Viola indicates, it will require a breakthrough to get a provable subexponential dependence on d)

  7. I prefer to think of the question at the following level of abstraction:

    Suppose we have proved that k(n)-wise independence fools a certain class of functions, e.g. DNFs, AC0 circuits, linear thresholds, etc.

    Then we know by Alon-Goldreich-Mansour that 1/n^k(n)-biased distributions also fool that class of functions. Can we do better? If we could use just, say, 1/(poly(n) * 2^k)-biased distributions then we would reduce the seed length from k(n)\log n to k(n) + \log n.

    Suppose now that we know that every \epsilon(n)-biased distribution fools a certain class of functions, where \epsilon(n) is less than inverse-polynomial. Then a particular way to construct an \epsilon(n)-biased distribution is to sample (\log 1/\epsilon(n))/\log n independent 1/n-biased distributions and take their XOR. This uses the optimal $O(\log 1/\epsilon(n))$ seed length, but it is a potentially more powerful distribution (e.g. it is pseudorandom against polynomials of degree (\log 1/\epsilon(n))/\log n). Could it suffice to XOR fewer copies of our basic 1/n-biased distribution and still derandomize the class of functions we are interested in?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s