Online convex optimization deals with the following setup: we want to design an algorithm that, at each discrete time step , comes up with a solution , where is a certain convex set of feasible solution. After the algorithm has selected its solution , a convex cost function , coming from a known restricted set of admissible cost functions , is revealed, and the algorithm pays the loss .
Again, the algorithm has to come up with a solution without knowing what cost functions it is supposed to be optimizing. Furthermore, we will think of the sequence of cost functions not as being fixed in advanced and unknown to the algorithm, but as being dynamically generated by an adversary, after seeing the solutions provided by the algorithm. (This resilience to adaptive adversaries will be important in most of the applications.)
The offline optimum after steps is the total cost that the best possible fixed solution would have incurred when evaluated against the cost functions seen by the algorithm, that is, it is a solution to
The regret after steps is the difference between the loss suffered by the algorithm and the offline optimum, that is,
The remarkable results that we will review give algorithms that achieve regret
that is, for fixed and , the regret-per-time-step goes to zero with the number of steps, as . It is intuitive that our bounds will have to depend on how big is the “diameter” of and how large is the “magnitude” and “smoothness” of the functions , but depending on how we choose to formalize these quantities we will be led to define different algorithms.