# Online Optimization Post 7: Matrix Multiplicative Weights Update

This is the seventh in a series of posts on online optimization, where we alternate one post explaining a result from the theory of online convex optimization and one post explaining an “application” in computational complexity or combinatorics. The first two posts were about the technique of Multiplicative Weights Updates and its application to “derandomizing” probabilistic arguments based on combining a Chernoff bound and a union bound. The third and fourth post were about the Follow-the-Regularized-Leader framework, which unifies multiplicative weights and gradient descent, and a “gradient descent view” of the Frieze-Kannan Weak Regularity Lemma. The fifth and sixth post were about the constrained version of the Follow-the-Regularized-Leader framework, and the Impagliazzo Hard-Core Set Lemma. Today we shall see the technique of Matrix Multiplicative Weights Updates.

1. Matrix Multiplicative Weights Update

In this post we consider the following generalization, introduced and studied by Arora and Kale, of the “learning from expert advice” setting and the multiplicative weights update method. In the “experts” model, we have a repeated game in which, at each time step ${t}$, we have the option of following the advice of one of ${n}$ experts; if we follow the advice of expert ${i}$ at time ${t}$, we incur a loss of ${\ell_t (i)}$, which is unknown to us (although, at time ${t}$ we know the loss functions ${\ell_1(\cdot),\ldots,\ell_{t-1}(\cdot)}$). We are allowed to choose a probabilistic strategy, whereby we follow the advice of expert ${i}$ with probability ${x_t(i)}$, so that our expected loss at time ${t}$ is ${\sum_{i=1}^n x_t(i) \ell_t(i)}$.

In the matrix version, instead of choosing an expert ${i}$ we are allowed to choose a unit ${n}$-dimensional vector ${v_t}$, and the loss incurred in choosing the vector ${v_t}$ is ${v_t ^T L_t v_t}$, where ${L_t}$ is an unknown symmetric ${n\times n}$ matrix. We are also allowed to choose a probabilistic strategy, so that with probability ${x_t(j)}$ we choose the unit vector ${v_t^{(j)}}$, and we incur the expected loss

$\displaystyle \sum_j x_t (j) \cdot (v_t^{(j)})^T L_t v_t^{(j)}$