Online Optimization Post 2: Constructing Pseudorandom Sets

Today we will see how to use the analysis of the multiplicative weights algorithm in order to construct pseudorandom sets.

The method will yield constructions that are optimal in terms of the size of the pseudorandom set, but not very efficient, although there is at least one case (getting an “almost pairwise independent” pseudorandom generator) in which the method does something that I am not sure how to replicate with other techniques.

Mostly, the point of this post is to illustrate a concept that will reoccur in more interesting contexts: that we can use an online optimization algorithm in order to construct a combinatorial object satisfying certain desired properties. The idea is to run a game between a “builder” against an “inspector,” in which the inspector runs the online optimization algorithm with the goal of finding a violated property in what the builder is building, and the builder plays the role of the adversary selecting the cost functions, with the advantage that it gets to build a piece of the construction after seeing what property the “inspector” is looking for. By the regret analysis of the online optimization problem, if the builder did well at each round against the inspector, then it will do well also against the “offline optimum” that looks for a violated property after seeing the whole construction. For example, the construction of graph sparsifiers by Allen-Zhu, Liao and Orecchia can be cast in this framework.

(In some other applications, it will be the “builder” that runs the algorithm and the “inspector” who plays the role of the adversary. This will be the case of the Frieze-Kannan weak regularity lemma and of the Impagliazzo hard-core lemma. In those cases we capitalize on the fact that we know that there is a very good offline optimum, and we keep going for as long as the adversary is able to find violated properties in what the builder is constructing. After a sufficiently large number of rounds, the regret experienced by the algorithm would exceed the general regret bound, so the process must terminate in a small number of rounds. I have been told that this is just the “dual view” of what I described in the previous paragraph.)

But, back the pseudorandom sets: if {{\cal C} = \{ C_1,\ldots,C_N \}} is a collection of boolean functions {C_i : \{ 0,1 \}^n \rightarrow \{ 0,1 \}}, for example the functions computed by circuits of a certain type and a certain size, then a multiset {S\subseteq \{ 0,1 \}^n} is {\epsilon}-pseudorandom for {\cal C} if, for every {C_i \in \cal C}, we have

\displaystyle  | \mathop{\mathbb P}_{u \sim \{ 0,1 \}^n} [ C_i (u) =1] - \mathop{\mathbb P}_{s \sim S} [C_i(s) = 1 ] | \leq \epsilon

That is, sampling uniformly from {S}, which we can do with {\log_2 |S|} random bits, is as good as sampling uniformly from {\{ 0,1 \}^n}, which requires {n} bits, as far as the functions in {\cal C} are concerned.

It is easy to use Chernoff bounds and union bounds to argue that there is such a set of size {O((\log N)/\epsilon^2)}, so that we can sample from it using only {\log\log N + 2\log \frac 1 \epsilon + O(1)} random bits.

We will prove this result (while also providing an “algorithm” for the construction) using multiplicative weights.

Continue reading

What’s New at the Simons Institute

This Thursday (December 15) is the deadline to apply for post-doctoral positions at the Simons Institute for the next academic year. In Fall 2017 we will run a single program, instead of two programs in parallel, on optimization. The program will be double the size of our typical programs, and will focus on the interplay between discrete and continuous methods in optimization. In Spring 2018 there will be a program on real-time decision-making and one on theoretical neuroscience.

In a few weeks, our Spring 2017 programs will start. The program on pseudorandomness will have a mix of complexity theorists and number theorists, and it will happen at the same time as a program on analytic number theory at MSRI. It will start with a series of lectures. It will start with a series of lectures in the third week of January. The program on foundations of machine learning has been much anticipated, and it will also start with a series of lectures, in the fourth week of January, which will bring to Berkeley at impressive collection of machine learning practitioners whose research is both applied and rigorous. All lectures will be streamed live, and I encourage you to set up viewing parties.

During the current semester, the Simons Institute had for the first time a journalist in residence. The inaugural resident journalist has been Erica Klarreich, that many of you probably know for her outstanding writing on mathematics and computer science for quanta magazine.

Next semester, our resident journalist will be Pulitzer-prize winner John Markoff, who just retired from the New York Times.

Small-bias Distributions and DNFs

Two of my favorite challenges in unconditional derandomization are to find logarithmic-seed pseudorandom generators which are good against:

  1. log-space randomized algorithms
  2. {AC^0}, that is, constant depth circuits of polynomial size

Regarding the first challenge, the best known pseudorandom generator remains Nisan’s, from 1990, which requires a seed of {O(\log^2 n)} bits. Maddeningly, even if we look at width-3 oblivious branching programs, that is, non-uniform algorithms that use only {\log_2 3} bits of memory, nobody knows how to beat Nisan’s generator.

Regarding the second challenge, Nisan showed in 1988 that for every {d} there is a pseudorandom generator of seed length {O(\log^{2d+6} n)} against depth-{d} circuits of size {n}. The simplest case is that of depth-2 circuits, or, without loss of generality, of disjunctive-normal-form (DNF) formulas. When specialized to DNF formulas, Nisan’s generator has seed length {O(\log^{10} n)}, but better constructions are known in this case.

Luby, Velickovic and Wigderson improved the seed length to {O(\log^4 n)} in 1993. Bazzi’s celebrated proof of the depth-2 case of the Linian-Nisan conjecture implies that a {O(\log^2 m/\delta)}-wise independent distribution “{\delta}-fools” every {m}-term DNF, by which we mean that for every such DNF formula {\phi} and every such distribution {X} we have

\displaystyle  \left| \mathop{\mathbb P}_{x\sim X} [ \phi(x) = 1] - \mathop{\mathbb P}_{x\sim U} [\phi(x) = 1] \right| \leq \delta

where {U} is the uniform distribution over assignments. This leads to a pseudorandom generator that {\delta}-fools {n}-variable, {m}-term DNF formulas and whose seed length is {O(\log n \cdot \log^2 m/\delta)}, which is {O(\log^3 n)} when {m,n,\delta^{-1}} are polynomially related.

In a new paper with Anindya De, Omid Etesami, and Madhur Tulsiani, we show that an {n}-variable, {m}-term DNF can be {\delta}-fooled by a generator of seed length {O(\log n + \log^2 m/\delta \cdot \log\log m/\delta)}, which is {O(\log^{2+o(1)} n)} when {n,m,\delta^{-1}} are polynomially related.

Our approach is similar to the one in Razborov’s proof of Bazzi’s result, but we use small-bias distribution instead of {k}-wise independent distributions

Continue reading

The Large Deviation of Fourwise Independent Random Variables

Suppose {X_1,\ldots,X_n} are mutually independent unbiased {\pm 1} random variables. Then we know everything about the distribution of

\displaystyle   | X_1 + \ldots + X_N | \ \ \ \ \ (1)

either by using the central limit theorem or by doing calculations by hand using binomial coefficients and Stirling’s approximation. In particular, we know that (1) takes the values {1,\ldots, 1/\sqrt N} with probability {\Theta(1/\sqrt N)} each, and so with constant probability (1) is at most {O(\sqrt N)}.

The last statement can be proved from scratch using only pairwise independence. We compute

\displaystyle  \mathop{\mathbb E} \left| \sum_i X_i \right|^2 = N

so that

\displaystyle  \mathop{\mathbb P} \left[ \left|\sum_i X_i \right| \geq c \cdot \sqrt N \right] = \mathop{\mathbb P} \left[ \left|\sum_i X_i \right|^2 \geq c^2 \cdot N \right] \leq \frac 1 {c^2}

It is also true that (1) is at least {\Omega(\sqrt N)} with constant probability, and this is trickier to prove.

First of all, note that a proof based on pairwise independence is not possible any more. If {(X_1,\ldots,X_N)} is a random row of an Hadamard matrix, then {\sum_i X_i = N} with probability {1/N}, and {\sum_i X_i =0} with probability {1-1/N}.

Happily, four-wise independence suffices.

Continue reading

Distinguishers from linear functions

In the last post we introduced the following problem: we are given a length-increasing function, the hardest case being a function {G: \{ 0,1 \}^{n-1} \rightarrow \{ 0,1 \}^n} whose output is one bit longer than the input, and we want to construct a generator {D} such that the advantage or distinguishing probability of {D}

\displaystyle   \left| \mathop{\mathbb P}_{z \in \{ 0,1 \}^{n-1}} [D(G(z)) =1 ] - \mathop{\mathbb P}_{x \in \{ 0,1 \}^{n}} [D(x) =1 ] \right| \ \ \ \ \ (1)

is as large as possible relative to the circuit complexity of {D}.

I will show how to achieve advantage {\epsilon} with a circuit of size {O(\epsilon^2 n 2^n)}. Getting rid of the suboptimal factor of {n} is a bit more complicated. These results are in this paper.

Continue reading

Efficiently Correlating with a Real-valued Function and Breaking PRGs

Suppose we have a length-increasing function {G: \{ 0,1 \}^{n-1} \rightarrow \{ 0,1 \}^n}, which we think of as a pseudorandom generator mapping a shorter seed into a longer output.

Then the distribution of {G(z)} for a random seed {z} is not uniform (in particular, it is concentrated on at most {2^{n-1}} of the {2^n} elements of {\{ 0,1 \}^n}). We say that a statistical test {D: \{ 0,1 \}^n \rightarrow \{ 0,1 \}} has advantage {\epsilon} in distinguishing the output of {G} from the uniform distribution if

\displaystyle   \left| \mathop{\mathbb P}_{z \in \{ 0,1 \}^{n-1}} [D(G(z)) =1 ] - \mathop{\mathbb P}_{x \in \{ 0,1 \}^{n}} [D(x) =1 ] \right| \geq \epsilon \ \ \ \ \ (1)

If the left-hand side of (1) is at most {\epsilon} for every {D} computable by a circuit of size {S}, then we say that {G} is {\epsilon}-pseudorandom against circuits of size {S}, or that it is an {(S,\epsilon)}-secure pseudorandom generator.

How secure can a pseudorandom generator possibly be? This question (if we make no assumption on the efficiency of {G}) is related to the question in the previous post on approximating a boolean function via small circuits. Both questions, in fact, are special cases of the question of how much an arbitrary real-valued function must correlate with functions computed by small circuits, which is answered in a new paper with Anindya De and Madhur Tulsiani.

Continue reading