Online Optimization Post 2: Constructing Pseudorandom Sets

Today we will see how to use the analysis of the multiplicative weights algorithm in order to construct pseudorandom sets.

The method will yield constructions that are optimal in terms of the size of the pseudorandom set, but not very efficient, although there is at least one case (getting an “almost pairwise independent” pseudorandom generator) in which the method does something that I am not sure how to replicate with other techniques.

Mostly, the point of this post is to illustrate a concept that will reoccur in more interesting contexts: that we can use an online optimization algorithm in order to construct a combinatorial object satisfying certain desired properties. The idea is to run a game between a “builder” against an “inspector,” in which the inspector runs the online optimization algorithm with the goal of finding a violated property in what the builder is building, and the builder plays the role of the adversary selecting the cost functions, with the advantage that it gets to build a piece of the construction after seeing what property the “inspector” is looking for. By the regret analysis of the online optimization problem, if the builder did well at each round against the inspector, then it will do well also against the “offline optimum” that looks for a violated property after seeing the whole construction. For example, the construction of graph sparsifiers by Allen-Zhu, Liao and Orecchia can be cast in this framework.

(In some other applications, it will be the “builder” that runs the algorithm and the “inspector” who plays the role of the adversary. This will be the case of the Frieze-Kannan weak regularity lemma and of the Impagliazzo hard-core lemma. In those cases we capitalize on the fact that we know that there is a very good offline optimum, and we keep going for as long as the adversary is able to find violated properties in what the builder is constructing. After a sufficiently large number of rounds, the regret experienced by the algorithm would exceed the general regret bound, so the process must terminate in a small number of rounds. I have been told that this is just the “dual view” of what I described in the previous paragraph.)

But, back the pseudorandom sets: if ${{\cal C} = \{ C_1,\ldots,C_N \}}$ is a collection of boolean functions ${C_i : \{ 0,1 \}^n \rightarrow \{ 0,1 \}}$, for example the functions computed by circuits of a certain type and a certain size, then a multiset ${S\subseteq \{ 0,1 \}^n}$ is ${\epsilon}$-pseudorandom for ${\cal C}$ if, for every ${C_i \in \cal C}$, we have $\displaystyle | \mathop{\mathbb P}_{u \sim \{ 0,1 \}^n} [ C_i (u) =1] - \mathop{\mathbb P}_{s \sim S} [C_i(s) = 1 ] | \leq \epsilon$

That is, sampling uniformly from ${S}$, which we can do with ${\log_2 |S|}$ random bits, is as good as sampling uniformly from ${\{ 0,1 \}^n}$, which requires ${n}$ bits, as far as the functions in ${\cal C}$ are concerned.

It is easy to use Chernoff bounds and union bounds to argue that there is such a set of size ${O((\log N)/\epsilon^2)}$, so that we can sample from it using only ${\log\log N + 2\log \frac 1 \epsilon + O(1)}$ random bits.

We will prove this result (while also providing an “algorithm” for the construction) using multiplicative weights.

Online Optimization Post 1: Multiplicative Weights

The multiplicative weights or hedge algorithm is the most well known and most frequently rediscovered algorithm in online optimization.

The problem it solves is usually described in the following language: we want to design an algorithm that makes the best possible use of the advice coming from ${n}$ self-described experts. At each time step ${t=1,2,\ldots}$, the algorithm has to decide with what probability to follow the advice of each of the experts, that is, the algorithm has to come up with a probability distribution ${x_t = (x_t(1),\ldots,x_t(n))}$ where ${x_t (i) \geq 0}$ and ${\sum_{i=1}^n x_t(i)=1}$. After the algorithm makes this choice, it is revealed that following the advice of expert ${i}$ at time ${t}$ leads to loss ${\ell_t (i)}$, so that the expected loss of the algorithm at time ${t}$ is ${\sum_{i=1}^n x_t(i) \ell_t (i)}$. A loss can be negative, in which case its absolute value can be interpreted as a profit.

After ${T}$ steps, the algorithm “regrets” that it did not just always follow the advice of the expert that, with hindsight, was the best one, so that the regret of the algorithm after ${T}$ steps is $\displaystyle {\rm Regret}_T = \left( \sum_{t=1}^T\sum_{i=1}^n x_t(i) \ell_t(i) \right) - \left( \min_{i=1,\ldots,n} \ \ \sum_{t=1}^T \ell_t(i) \right)$

This corresponds to the instantiation of the framework we described in the previous post to the special case in which the set of feasible solutions ${K}$ is the set ${\Delta \subseteq {\mathbb R}^n}$ of probability distributions over the sample space ${\{ 1,\ldots,n\}}$ and in which the loss functions ${f_t (x)}$ are linear functions of the form ${f_t (x) = \sum_i x(i) \ell_t (i)}$. In order to bound the regret, we also have to bound the “magnitude” of the loss functions, so in the following we will assume that for all ${t}$ and all ${i}$ we have ${| \ell_t (i) | \leq 1}$, and otherwise we can scale everything by a known upper bound on ${\max_{t,i} |\ell_t |}$.

We now describe the algorithm.

The algorithm maintains at each step ${t}$ a vector of weights ${w_t = (w_t(1),\ldots,w_t(n))}$ which is initialized as ${w_1 := (1,\ldots,1)}$. The algorithm performs the following operations at time ${t}$:

• ${w_t (i) := w_{t-1} (i) \cdot e^{-\epsilon \ell_{t-1} (i) }}$
• ${x_t (i) := \displaystyle \frac {w_t (i) }{\sum_{j=1}^n w_t(j) }}$

That is, the weight of expert ${i}$ at time ${t}$ is ${e^{-\epsilon \sum_{k=1}^{t-1} \ell_k (i)}}$, and the probability ${x_t(i)}$ of following the advice of expert ${i}$ at time ${t}$ is proportional to the weight. The parameter ${\epsilon>0}$ is hardwired into the algorithm and we will optimize it later. Note that the algorithm gives higher weight to experts that produced small losses (or negative losses of large absolute value) in the past, and thus puts higher probability on such experts.

We will prove the following bound.

Theorem 1 Assuming that for all ${t}$ and ${i}$ we have ${| \ell_t(i) | \leq 1}$, for every ${0 < \epsilon < 1/2}$, after ${T}$ steps the multiplicative weight algorithm experiences a regret that is always bounded as $\displaystyle {\rm Regret}_T \leq \epsilon \sum_{t=1}^T \sum_{i=1}^n x_t(i) \ell^2 _t (i) + \frac {\ln n}{\epsilon} \leq \epsilon T + \frac {\ln n}{\epsilon}$

In particular, if ${T > 4 \ln n}$, by setting ${\epsilon = \sqrt{\frac{\ln n}{T}}}$ we achieve a regret bound $\displaystyle {\rm Regret}_T \leq 2 \sqrt{T \ln n}$