Today we will see how to use the analysis of the multiplicative weights algorithm in order to construct pseudorandom sets.

The method will yield constructions that are optimal in terms of the size of the pseudorandom set, but not very efficient, although there is at least one case (getting an “almost pairwise independent” pseudorandom generator) in which the method does something that I am not sure how to replicate with other techniques.

Mostly, the point of this post is to illustrate a concept that will reoccur in more interesting contexts: that we can use an online optimization algorithm in order to construct a combinatorial object satisfying certain desired properties. The idea is to run a game between a “builder” against an “inspector,” in which the inspector runs the online optimization algorithm with the goal of finding a violated property in what the builder is building, and the builder plays the role of the adversary selecting the cost functions, with the advantage that it gets to build a piece of the construction after seeing what property the “inspector” is looking for. By the regret analysis of the online optimization problem, if the builder did well at each round against the inspector, then it will do well also against the “offline optimum” that looks for a violated property after seeing the whole construction. For example, the construction of graph sparsifiers by Allen-Zhu, Liao and Orecchia can be cast in this framework.

(In some other applications, it will be the “builder” that runs the algorithm and the “inspector” who plays the role of the adversary. This will be the case of the Frieze-Kannan weak regularity lemma and of the Impagliazzo hard-core lemma. In those cases we capitalize on the fact that we know that there is a very good offline optimum, and we keep going for as long as the adversary is able to find violated properties in what the builder is constructing. After a sufficiently large number of rounds, the regret experienced by the algorithm would exceed the general regret bound, so the process must terminate in a small number of rounds. I have been told that this is just the “dual view” of what I described in the previous paragraph.)

But, back the pseudorandom sets: if is a collection of boolean functions , for example the functions computed by circuits of a certain type and a certain size, then a multiset is -pseudorandom for if, for every , we have

That is, sampling uniformly from , which we can do with random bits, is as good as sampling uniformly from , which requires bits, as far as the functions in are concerned.

It is easy to use Chernoff bounds and union bounds to argue that there is such a set of size , so that we can sample from it using only random bits.

We will prove this result (while also providing an “algorithm” for the construction) using multiplicative weights.

First of all, possibly by changing to , we may assume that for every function the function is also in . This simplifies things a bit because then the pseudorandom condition is equivalent to just

We will make up an “experts” setup in which there is an expert for each function . Thus, the algorithm, at each step, comes up with a probability distribution over the functions, which we can think of as a “probabilistic function.” At time , the adversary chooses a string and defines the cost function

where the adversary chooses an such that . At this point, the reader should try, without reading ahead, to establish:

- That such a choice of is always possible;
- That the cost function is of the form , where the loss vector satisfies , so that the regret after steps is ;
- That the sequence of choices by the adversary determines a -pseudorandom multiset for , and, in particular, we get an -pseudorandom multiset of cardinality

For the first point, note that for a random we have

so there is an such that

For the second point we just have to inspect the definition, and for the last point we have, by construction

so the regret bound is

which, after dividing by , is

Consider now the application of constructing a small-support distribution over that is -almost-pairwise-independent, meaning that if is a random string sampled according to this distribution, then, for every , the marginal is -close to the uniform distribution over in total variation distance. This is the same thing as asking for a small-support distribution that is -pseudorandom for all functions that depend on only two input variables. There are only such functions, so the above construction gives us a pseudorandom distribution that is uniform over a set of size , meaning that the distribution can be sampled using random bits. Furthermore the algorithm can be implemented to run in time . The only tricky step is how to find the string at each step. For a string , the loss obtained by choosing as the “reference string” is a polynomial of degree 2 in the bits of , so we can find a no-worse-than-average using the method of conditional expectations. I am not sure if there is a more standard way of doing this construction, perhaps one in which the bit of the -th string in the sample space can be generated in time . The standard approach is to combine a small-bias generator with a linear family of pairwise independent hash functions, but even using Ta-Shma’s construction of small-bias generators we would not get the correct dependency on . This framework can “derandomize Chernoff bounds” in other settings as well, such as randomized rounding of packing and covering integer linear programs, and it is basically the same thing as the method of “pessimistic estimators” described in the Motwani-Raghavan book on randomized algorithms.

In the Chernoff bound sentence, N should be log(N) (so the number of bits is loglog(N)).

Thanks, I have corrected it

Hello Luca,

can you please explain the averaging argument just after “dividing by T”? Specifically, I do not follow how the part $\sum_{i=1}^N x(i)$ involved in $f_t(x)$ is eliminated.

Cheers,

Elad

Hi Elad, the regret bound holds for every fixed distribution , so I am just looking, for each , at the implication of the regret bound for the cases in which gives probability 1 to i and probability 0 to the other experts.

Pingback: Online Optimization Post 7: Matrix Multiplicative Weights Update | in theory