# Online Optimization Post 2: Constructing Pseudorandom Sets

Today we will see how to use the analysis of the multiplicative weights algorithm in order to construct pseudorandom sets.

The method will yield constructions that are optimal in terms of the size of the pseudorandom set, but not very efficient, although there is at least one case (getting an “almost pairwise independent” pseudorandom generator) in which the method does something that I am not sure how to replicate with other techniques.

Mostly, the point of this post is to illustrate a concept that will reoccur in more interesting contexts: that we can use an online optimization algorithm in order to construct a combinatorial object satisfying certain desired properties. The idea is to run a game between a “builder” against an “inspector,” in which the inspector runs the online optimization algorithm with the goal of finding a violated property in what the builder is building, and the builder plays the role of the adversary selecting the cost functions, with the advantage that it gets to build a piece of the construction after seeing what property the “inspector” is looking for. By the regret analysis of the online optimization problem, if the builder did well at each round against the inspector, then it will do well also against the “offline optimum” that looks for a violated property after seeing the whole construction. For example, the construction of graph sparsifiers by Allen-Zhu, Liao and Orecchia can be cast in this framework.

(In some other applications, it will be the “builder” that runs the algorithm and the “inspector” who plays the role of the adversary. This will be the case of the Frieze-Kannan weak regularity lemma and of the Impagliazzo hard-core lemma. In those cases we capitalize on the fact that we know that there is a very good offline optimum, and we keep going for as long as the adversary is able to find violated properties in what the builder is constructing. After a sufficiently large number of rounds, the regret experienced by the algorithm would exceed the general regret bound, so the process must terminate in a small number of rounds. I have been told that this is just the “dual view” of what I described in the previous paragraph.)

But, back the pseudorandom sets: if ${{\cal C} = \{ C_1,\ldots,C_N \}}$ is a collection of boolean functions ${C_i : \{ 0,1 \}^n \rightarrow \{ 0,1 \}}$, for example the functions computed by circuits of a certain type and a certain size, then a multiset ${S\subseteq \{ 0,1 \}^n}$ is ${\epsilon}$-pseudorandom for ${\cal C}$ if, for every ${C_i \in \cal C}$, we have

$\displaystyle | \mathop{\mathbb P}_{u \sim \{ 0,1 \}^n} [ C_i (u) =1] - \mathop{\mathbb P}_{s \sim S} [C_i(s) = 1 ] | \leq \epsilon$

That is, sampling uniformly from ${S}$, which we can do with ${\log_2 |S|}$ random bits, is as good as sampling uniformly from ${\{ 0,1 \}^n}$, which requires ${n}$ bits, as far as the functions in ${\cal C}$ are concerned.

It is easy to use Chernoff bounds and union bounds to argue that there is such a set of size ${O((\log N)/\epsilon^2)}$, so that we can sample from it using only ${\log\log N + 2\log \frac 1 \epsilon + O(1)}$ random bits.

We will prove this result (while also providing an “algorithm” for the construction) using multiplicative weights.

First of all, possibly by changing ${N}$ to ${2N}$, we may assume that for every function ${C \in {\cal C}}$ the function ${1-C}$ is also in ${\cal C}$. This simplifies things a bit because then the pseudorandom condition is equivalent to just

$\displaystyle \forall C\in {\cal C} \ \ \ \mathop{\mathbb P}_{u \sim \{ 0,1 \}^n} [ C (u) =1] - \mathop{\mathbb P}_{s \sim S} [C(s) = 1 ] \geq - \epsilon$

We will make up an “experts” setup in which there is an expert ${i}$ for each function ${C_i}$. Thus, the algorithm, at each step, comes up with a probability distribution ${x_t}$ over the functions, which we can think of as a “probabilistic function.” At time ${t}$, the adversary chooses a string ${s_t \in \{ 0,1 \}^n}$ and defines the cost function

$\displaystyle f_t (x) := \sum_{i=1}^N x(i) \cdot \left( \mathop{\mathbb P}_{u \sim \{ 0,1 \}^n} [ C_i (u) = 1 ] - C_i (s_t) \right)$

where the adversary chooses an ${s_t}$ such that ${f_t(x_t) \geq 0}$. At this point, the reader should try, without reading ahead, to establish:

1. That such a choice of ${s_t}$ is always possible;
2. That the cost function is of the form ${f_t(x) = \langle \ell_t , x\rangle}$, where the loss vector ${\ell_t}$ satisfies ${|\ell_t (i) | \leq 1}$, so that the regret after ${T}$ steps is ${\leq 2 \sqrt{T \ln N}}$;
3. That the sequence ${s_1,\ldots,s_T}$ of choices by the adversary determines a ${2\sqrt {\frac {\ln N}{T}}}$-pseudorandom multiset for ${\cal C}$, and, in particular, we get an ${\epsilon}$-pseudorandom multiset of cardinality ${4 \frac {\ln N}{\epsilon^2}}$

For the first point, note that for a random ${s \sim \{ 0,1 \}^n}$ we have

$\displaystyle \mathop{\mathbb E}_{s\sim \{ 0,1 \}^n} \sum_{i=1}^N x_t(i) \cdot \left( \mathop{\mathbb P}_{u \sim \{ 0,1 \}^n} [ C_i (u) = 1 ] - C_i (s) \right) = 0$

so there is an ${s_t}$ such that

$\displaystyle \sum_{i=1}^N x_t(i) \cdot \left( \mathop{\mathbb P}_{u \sim \{ 0,1 \}^n} [ C_i (u) = 1 ] - C_i (s_t) \right) \geq 0$

For the second point we just have to inspect the definition, and for the last point we have, by construction

$\displaystyle \sum_{t=1}^T f_t(x_t) \geq 0$

so the regret bound is

$\displaystyle \min_{x} \sum_{t=1}^T f_t(x) \geq - {\rm Regret}_T \geq - 2 \sqrt{T\ln n}$

which, after dividing by ${T}$, is

$\displaystyle \forall i : \ \ \ \frac 1T \sum_{t=1}^T \left( \mathop{\mathbb P}_{u \sim \{ 0,1 \}^n} [ C_i (u) = 1 ] - C_i (s_t) \right) \geq - 2 \sqrt{\frac {\ln n}{T}}$

$\displaystyle \forall i: \ \ \ \ \mathop{\mathbb P}_{u \sim \{ 0,1 \}^n} [ C_i (u) = 1 ] - \Pr_{s\in \{ s_1,\ldots,s_T \} } [C_i (s) = 1 ] \geq - 2 \sqrt{\frac {\ln n}{T}}$

Consider now the application of constructing a small-support distribution over ${\{ 0,1 \}^n}$ that is ${\epsilon}$-almost-pairwise-independent, meaning that if ${s}$ is a random string sampled according to this distribution, then, for every ${i,j}$, the marginal ${(s_i,s_j)}$ is ${\epsilon}$-close to the uniform distribution over ${\{ 0,1 \}^2}$ in total variation distance. This is the same thing as asking for a small-support distribution that is ${\epsilon}$-pseudorandom for all functions ${\{ 0,1 \}^n \rightarrow \{ 0,1 \}}$ that depend on only two input variables. There are only ${O(n^2)}$ such functions, so the above construction gives us a pseudorandom distribution that is uniform over a set of size ${O(\epsilon^{-2} \ln n)}$, meaning that the distribution can be sampled using ${\log\log n + 2 \log \frac 1 \epsilon + O(1)}$ random bits. Furthermore the algorithm can be implemented to run in time ${n^{O(1)} / \epsilon^2}$. The only tricky step is how to find the string ${s_t}$ at each step. For a string ${s}$, the loss ${f (x_t)}$ obtained by choosing ${s}$ as the “reference string” is a polynomial of degree 2 in the bits of ${s}$, so we can find a no-worse-than-average ${s_t}$ using the method of conditional expectations. I am not sure if there is a more standard way of doing this construction, perhaps one in which the bit ${j}$ of the ${k}$-th string in the sample space can be generated in time ${(\log n)^{O(1)}}$. The standard approach is to combine a small-bias generator with a linear family of pairwise independent hash functions, but even using Ta-Shma’s construction of small-bias generators we would not get the correct dependency on ${\epsilon}$. This framework can “derandomize Chernoff bounds” in other settings as well, such as randomized rounding of packing and covering integer linear programs, and it is basically the same thing as the method of “pessimistic estimators” described in the Motwani-Raghavan book on randomized algorithms.

## 5 thoughts on “Online Optimization Post 2: Constructing Pseudorandom Sets”

1. In the Chernoff bound sentence, N should be log(N) (so the number of bits is loglog(N)).

2. Hello Luca,

can you please explain the averaging argument just after “dividing by T”? Specifically, I do not follow how the part $\sum_{i=1}^N x(i)$ involved in $f_t(x)$ is eliminated.

Cheers,
3. Hi Elad, the regret bound holds for every fixed distribution $x$, so I am just looking, for each $i$, at the implication of the regret bound for the cases in which $x$ gives probability 1 to i and probability 0 to the other experts.