The multiplicative weights or hedge algorithm is the most well known and most frequently rediscovered algorithm in online optimization.
The problem it solves is usually described in the following language: we want to design an algorithm that makes the best possible use of the advice coming from self-described experts. At each time step , the algorithm has to decide with what probability to follow the advice of each of the experts, that is, the algorithm has to come up with a probability distribution where and . After the algorithm makes this choice, it is revealed that following the advice of expert at time leads to loss , so that the expected loss of the algorithm at time is . A loss can be negative, in which case its absolute value can be interpreted as a profit.
After steps, the algorithm “regrets” that it did not just always follow the advice of the expert that, with hindsight, was the best one, so that the regret of the algorithm after steps is
This corresponds to the instantiation of the framework we described in the previous post to the special case in which the set of feasible solutions is the set of probability distributions over the sample space and in which the loss functions are linear functions of the form . In order to bound the regret, we also have to bound the “magnitude” of the loss functions, so in the following we will assume that for all and all we have , and otherwise we can scale everything by a known upper bound on .
We now describe the algorithm.
The algorithm maintains at each step a vector of weights which is initialized as . The algorithm performs the following operations at time :
That is, the weight of expert at time is , and the probability of following the advice of expert at time is proportional to the weight. The parameter is hardwired into the algorithm and we will optimize it later. Note that the algorithm gives higher weight to experts that produced small losses (or negative losses of large absolute value) in the past, and thus puts higher probability on such experts.
We will prove the following bound.
Theorem 1 Assuming that for all and we have , for every , after steps the multiplicative weight algorithm experiences a regret that is always bounded as
In particular, if , by setting we achieve a regret bound