(Image credit: The New Yorker)
(Image credit: The New Yorker)
I am recruiting for two postdoctoral positions, each for one year renewable to a second, to work with me at Bocconi University on topics related to average-case analysis of algorithms, approximation algorithms, and combinatorial constructions.
The positions have a very competitive salary and relocation benefits. Funding for travel is available.
Application information is at this link. The deadline is December 15. If you apply, please also send me an email (L.Trevisan at unibocconi.it) to let me know.
Ten weeks into my move to Italy I am feeling more and more settled. I even eventually got a Milan bus card, which proves that if you really believe in yourself you can achieve anything. I held a midterm for my theoretical computer science undergraduate class, and there were zero students asking for special accommodations and zero students complaining to me about their grade.
According to our national tradition, I like to complain, but Bocconi is really not giving me much to work with.
But enough about me, let’s talk about you. I am going to assume that you want to come to Milan, because, really, why not? Here are some ways in which this can happen:
I am in Baltimore for FOCS through Tuesday morning if you want to talk to me in person.
It has been six weeks since I moved to Milan, and I am not yet completely settled in yet.
For example, although, as of yesterday, I finally have working wired internet access in my place, I still do not have a bus card (obtaining the latter has been one of the most stubbornly intractable problems I have encountered) and all the stuff that I did not carry in two bags is still in transit in a container.
Meanwhile, the busyness of handling the move, getting settled, trying to get a bus card, and teaching two courses, has meant that I did not really have time to sit down with my thoughts and process my feelings about such a major life change. If people ask me what I miss about San Francisco I will, truthfully, say something like UberX, or Thai food, or getting a bus card from a vending machine, because I still have not had a chance to miss the bigger stuff. Similarly, this post will be about random small stuff.
ITC replaces the International Conference on Information Theoretic Security (ICITS), which was dedicated to the same topic and ran 2005-2017. ITC can be seen as a reboot of ICITS with a new name, a new steering committee and a renewed excitement. (beware: there is a fake website for ICITS 2019 created by a known fraudulent organization)
The first ITC conference will take place in Boston, MA on June 17-19, 2020 (just before STOC). The submission deadline for ITC 2020 is Dec 16, 2019 and the call for papers (including a nomination procedure for the greatest hits track) is available here: https://itcrypto.github.io/2020.html
Please submit your best work to ITC 2020! We hope to see many of you there!
A few months ago, I was delighted to see the University of California holding on to its demands in its negotiations with Elsevier. The U.C. wanted to renegotiate its contract so that, in addition to having access to the subscribed journals, U.C. scholars could publish in them with open access (that is, so that anybody in the world would have free access to the articles written by U.C. scholars).
This seemed like a reasonable model to balance profitability for publishers and open access, but there was no way to agree on it with Elsevier. Meanwhile, U.C. has not renewed its Elsevier subscriptions and Elsevier has cut off access to U.C. libraries.
I was very impressed to see the University of California central administration do something right, so I wondered if this was the kind of portent that is a harbinger of the apocalypse, or just a fluke. Subsequent events suggest the latter.
The University of California has spent a lot of time and money to build a centralized system for job applications and for job applicant review. I was first made aware of this when I chaired the recruiting committee for the Simons Director position. At first we were told that we could solicit applications through the (vastly superior) EECS-built system for job applications and reviews. After the application deadline passed, we were told that, in fact, we could not use the EECS system, and so the already overworked EECS faculty HR person had to manually copy all the data in the central campus system.
The American Mathematical Society has created a wonderfully functional system, called Mathjobs where applicants for academic mathematics jobs (ranging from postdocs to professorship) can upload their application material once, and their recommenders can upload their letters once, and then all the universities that the candidate applies to have access to this material. Furthermore, if needed, both applicants and recommenders can tailor-make their material for a particular university or universities, if they want to.
Everybody was living happily, but not ever after, because the U.C. central campus administration decided that everybody in the University of California had to use the centralized system for all jobs. Both the AMS and U.C. mathematicians tried to find a reasonable accommodation, such as allowing the U.C. system to access the letters posted on mathjobs. The campus administration reasoned response was roughly “sucks to be you.” There is more of the story in an AMS notices article by the chair of math at U.C. Davis.
Finally, this year U.C. Berkeley will not be listed in the US News and World Report rankings because it has submitted wrong data in the past.
In this post we return to the generic form of the FTRL online optimization algorithm. If the cost functions are linear, as they will be in all the applications that I plan to talk about, the algorithm is:
where is the convex set of feasible solutions that the algorithm is allowed to produce, is the linear loss function at time , and is the strictly convex regularizer.
If we have an unconstrained problem, that is, if , then the optimization problem (1) has a unique solution: the such that
and we can usually both compute efficiently in an algorithm and reason about effectively in an analysis.
Unfortunately, we are almost always interested in constrained settings, and then it becomes difficult both to compute and to reason about it.
A very nice special case happens when the regularizer acts as a barrier function for , that is, the (norm of the) gradient of goes to infinity when one approaches the boundary of . In such a case, it is impossible for the minimum of (1) to occur at the boundary and the solution will be again the unique in such that
We swept this point under the rug when we studied FTRL with negative-entropy regularizer in the settings of experts, in which is the set of probability distributions. When we proceeded to solve (1) using Lagrange multipliers, we ignored the non-negativity constraints. The reason why it was ok to do so was that the negative-entropy is a barrier function for the non-negative orthant .
Another important special case occurs when the regularizer is a multiple of length-squared. In this case, we saw that we could “decouple” the optimization problem by first solving the unconstrained optimization problem, and then projecting the solution of the unconstrained problem to :
Then we have the closed-form solution and, depending on the set , the projection might also have a nice closed-form, as in the case that comes up in results related to regularity lemmas.
As we will see today, this approach of solving the unconstrained problem and then projecting on works for every regularizer, for an appropriate notion of projection called the Bregman projection (the projection will depend on the regularizer).
To define the Bregman projection, we will first define the Bregman divergence with respect to the regularizer , which is a non-negative “distance” defined on (or possibly a subset of for which the regularizer is a barrier function). Then, the Bregman projection of on is defined as .
Unfortunately, it is not so easy to reason about Bregman projections either, but the notion of Bregman divergence offers a way to reinterpret the FTRL algorithm from another point of view, called mirror descent. Via this reinterpretation, we will prove the regret bound
which carries the intuition that the regret comes from a combination of the “distance” of our initial solution from the offline optimum and of the “stability” of the algorithm, that is, the “distance” between consecutive soltuions. Nicely, the above bound measures both quantities using the same “distance” function.
I would like to congratulate my Taiwanese readers for being in the first Asian country to introduce same-sex marriage.
We now discuss how to view proofs of certain regularity lemmas as applications of the FTRL methodology.
The Szemeredi Regularity Lemma states (in modern language) that every dense graph is well approximate by a graph with a very simple structure, made of the (edge-disjoint) union of a constant number of weighted complete bipartite subgraphs. The notion of approximation is a bit complicated to describe, but it enables the proof of counting lemmas, which show that, for example, the number of triangles in the original graph is well approximated by the (appropriately weighted) number of triangles in the approximating graph.
Analogous regularity lemmas, in which an arbitrary object is approximated by a low-complexity object, have been proved for hypergraphs, for subsets of abelian groups (for applications to additive combinatorics), in an analytic setting (for applications to graph limits) and so on.
The weak regularity lemma of Frieze and Kannan provides, as the name suggests, a weaker kind of approximation than the one promised by Szemeredi’s lemma, but one that is achievable with a graph that has a much smaller number of pieces. If is the “approximation error” that one is willing to tolerate, Szemeredi’s lemma constructs a graph that is the union of a weighted complete bipartite subgraphs where the height of the tower of exponentials is polynomial in . In the Frieze-Kannan construction, that number is cut down to a single exponential . This result too can be generalized to graph limits, subsets of groups, and so on.
With Tulsiani and Vadhan, we proved an abstract version of the Frieze-Kannan lemma (which can be applied to graphs, functions, distributions, etc.) in which the “complexity” of the approximation is . In the graph case, the approximating graph is still the union of complete bipartite subgraphs, but it has a more compact representation. One consequence of this result is that for every high-min-entropy distribution , there is an efficiently samplable distribution with the same min-entropy as , that is indistinguishable from . Such a result could be taken to be a proof that what GANs attempt to achieve is possible in principle, except that our result requires an unrealistically high entropy (and we achieve “efficient samplability” and “indistinguishability” only in a weak sense).
All these results are proved with a similar strategy: one starts from a trivial approximator, for example the empty graph, and then repeats the following iteration: if the current approximator achieves the required approximation, then we are done; otherwise take a counterexample, and modify the approximator using the counterexample. Then one shows that:
Typically, the number of iterations is , and the difference between the various results is given by whether at each iteration the “complexity” increases exponentially, or by a multiplicative factor, or by an additive term.
Like in the post on pseudorandom constructions, one can view such constructions as an online game between a “builder” and an “inspector,” except that now the online optimization algorithm will play the role of the builder, and the inspector is the one acting as an adversary. The bound on the number of rounds comes from the fact that the online optimization algorithms that we have seen so far achieve amortized error per round after rounds, so it takes rounds for the error bound to go below .
We will see that the abstract weak regularity lemma of my paper with Tulsiani and Vadhan (and hence the graph weak regularity lemma of Frieze and Kannan) can be immediately deduced from the theory developed in the previous post.
When I was preparing these notes, I was asked by several people if the same can be done for Szemeredi’s lemma. I don’t see a natural way of doing that. For such results, one should maybe use the online optimization techniques as a guide rather than as a black box. In general, iterative arguments (in which one constructs an object through a series of improvements) require the choice of a potential function, and an argument about how much the potential function changes at every step. The power of the FTRL method is that it creates the potential function and a big part of the analysis automatically and, even where it does not work directly, it can serve as an inspiration.
One could imagine a counterfactual history in which people first proved the weak regularity lemma using online optimization out of the box, as we do in this post, and then decided to try and use an L2 potential function and an iterative method to get the Szemeredi lemma, subsequently trying to see what happens if the potential function is entropy, thus discovering Jacob Fox’s major improvement on the “triangle removal lemma,” which involves the construction of an approximator that just approximates the number of triangles.
The multiplicative weights algorithm is simple to define and analyze, and it has several applications, but both its definition and its analysis seem to come out of nowhere. We mentioned that all the quantities arising in the algorithm and its analysis have statistical physics interpretations, but even this observation brings up more questions than it answers. The Gibbs distribution, for example, does put more weight on lower-energy states, and so it makes sense in an optimization setting, but to get good approximations one wants to use lower temperatures, while the distributions used by the multiplicative weights algorithms have temperature , where is the final “amortized” regret bound, so that one uses, quite counterintuitively, higher temperatures for better approximations.
Furthermore, it is not clear how we would generalize the ideas of multiplicative weights to the case in which the set of feasible solutions is anything other than the set of distributions.
Today we discuss the “Follow the Regularized Leader” method, which provides a framework to design and analyze online algorithms in a versatile and well-motivated way. We will then see how we can “discover” the definition and analysis of multiplicative weights, and how to “discover” another online algorithm which can be seen as a generalization of projected gradient descent (that is, one can derive the projected gradient descent algorithm and its analysis from this other online algorithm).