Keith Ball on Bourgain’s Legacy in Geometric Functional Analysis

The Bulletin of the AMS has just posted an article by Keith Ball on the legacy of Bourgain’s work on geometric functional analysis.

This beautifully written article talks about results and conjectures that are probably familiar to readers of in theory, but from the perspective of their mathematical motivations and of the bigger picture in which they fit.

Post-doc Opportunities in Milan

[call for applications] [application form]

I am recruiting two postdocs for two-year positions to work with me starting in Fall 2021 at Bocconi University. The positions have competitive salaries and are tax-free. If applicable, I will pay for relocation expenses, including the assistance of a relocation agency for help in finding a place to live and activate utilities, to complete immigration formalities, and to sign up for the national health care service.

Milan has been suffering as much or more than other European and American big cities for the effects of the Covid-19 pandemic. I have seen Milan in its normal condition for a few months from September 2019 to February 2020, and it is a beautiful cosmopolitan city, with an active cultural and social life, and with beautiful surroundings. Like San Francisco, it is smaller than one would expect it to be and very walkable (no hills!). Bocconi is situated in a semi-central area, about twenty minute walk from the Duomo.

I have received a large European grant that, besides paying for these postdoc positions, has a budget for senior visitors and for organizing two workshops over the duration of the grant. In particular, I was planning a workshop to be held last May in a villa on Lake Como. All such plans have been on hold, but Fall 2021 should be around the time that the global pandemic emergency ends, and I am planning for a lot of exciting scientific activity at Bocconi in the academic year 2021-22 and beyond.

I am looking for candidates with an established body of work on topics related to my research agenda, such as pseudorandomness and combinatorial constructions; spectral graph theory; worst-case and average-case analysis of semidefinite programming relaxation of combinatorial optimization problems.

Here are the call for applications and the application form.

“Visioning” workshop call for participation

In 2008, the Committee for the Advancement of Theoretical Computer Science convened a workshop to brainstorm directions and talking points for TCS
program managers at funding agencies to advocate for theory funding. The event was quite productive and successful.

A second such workshop is going to be held, online, in the third week of July. Applications to participate are due on June 15, a week from today. Organizers expect that participants will have to devote about four hours of their time to the workshop, and those who volunteer to be team leads will have a time commitment of about ten hours.

More information at this link

Spectral Sparsification of Hypergraphs

In this post we will construct a “spectral sparsifier” of a given hypergraph in a way that is similar to how Spielman and Srivastava construct spectral graph sparsifiers. We will assign a probability {p_e} to each hyperedge, we will sample each hyperedge {e} with probability {p_e}, and we will weigh it by {1/p_e} if selected. We will then bound the “spectral error” of this construction in terms of the supremum of a Gaussian process using Talagrand’s comparison inequality and finally bound the supremum of the Gaussian process (which will involve matrices) using matrix Chernoff bounds. This is joint work with Nikhil Bansal and Ola Svensson.

Continue reading

Spielman-Srivastava Sparsification à la Talagrand

This is the second in a series of posts explaining a result on hypergraph sparsification that uses Talagrand’s work on Gaussian processes.

In the previous post we talked about Gaussian and sub-Gaussian processes and generic chaining.

In this post we talk about the Spielman-Srivastava probabilistic construction of graph sparsifiers. Their analysis requires a bound on the largest eigenvalue of a certain random matrix, that can be derived from matrix Chernoff bounds.

We will then make our life harder and we will also derive an analysis of the Spielman-Srivastava construction by casting the largest eigenvalue of that random matrix as the sup of a sub-Gaussian process, and then we will apply the machinery from the previous post.

This will be more complicated than it needs to be, but the payoff will be that, as will be shown in the next post, this more complicated proof will also apply, with some changes, to the setting of hypergraphs.

Continue reading

Talagrand’s Generic Chaining

Welcome to phase two of in theory, in which we again talk about math. I spent last Fall teaching two courses and getting settled, I mostly traveled in January and February, and I have spent the last two months on my sofa catching up on TV series. Hence I will reach back to last Spring, when I learned about Talagrand’s machinery of generic chaining and majorizing measures from Nikhil Bansal, in the context of our work with Ola Svensson on graph and hypergraph sparsification. Here I would like to record what I understood about the machinery, and in a follow-up post I plan to explain the application to hypergraph sparsification.

Continue reading

Postdoc Position at Bocconi

I am recruiting for two postdoctoral positions, each for one year renewable to a second, to work with me at Bocconi University on topics related to average-case analysis of algorithms, approximation algorithms, and combinatorial constructions.

The positions have a very competitive salary and relocation benefits. Funding for travel is available.

Application information is at this link. The deadline is December 15. If you apply, please also send me an email (L.Trevisan at unibocconi.it) to let me know.

A New Conference on Information-Theoretic Cryptography

A new conference on Information-Theoretic Cryptography is starting next year. It is about topics that I have always been fond of, a former student of mine is in the steering committee, and an academic grandchild of mine is in the program committee of the inaugural conference, so I am very happy to pass along the following announcement from Benny Applebaum.

Dear friends,
We are happy to announce the birth of a new conference on Information-Theoretic Cryptography (ITC). Information-theoretic cryptography studies security in the presence of computationally unbounded adversaries and covers a wide array of topics at the intersection of cryptography, coding theory, information-theory and theory of computation. Notable examples include randomness extraction and privacy amplification, secret sharing, secure multiparty computation and proof systems, private-information retrieval and locally decodable codes, authentication codes and non-malleable codes, differential privacy, quantum information processing, and information-theoretic foundations of physical-layer security. See https://itcrypto.github.io for more information.

ITC replaces the International Conference on Information Theoretic Security (ICITS), which was dedicated to the same topic and ran 2005-2017. ITC can be seen as a reboot of ICITS with a new name, a new steering committee and a renewed excitement.  (beware: there is  a fake website for ICITS 2019 created by a known fraudulent organization)

The conference will have two tracks: a conference track and  a “greatest hits” track. The conference track will operate like a traditional conference with the usual review process and published proceedings. The “greatest hits” track consists of invited talks (not included in the proceedings) that highlight the most exciting recent advances in the area. We solicit nominations for “greatest hits” talks from the community.

The first ITC conference will take place in Boston, MA on June 17-19, 2020 (just before STOC).  The submission deadline for ITC 2020 is Dec 16, 2019 and the call for papers (including a nomination procedure for the greatest hits track) is available here:  https://itcrypto.github.io/2020.html

Please submit your best work to ITC 2020! We hope to see many of you there!

best regards,
The Steering Committee:  Benny Applebaum (Chair), Ivan Damgård , Yevgeniy Dodis,  Yuval Ishai, Ueli Maurer,  Kobbi Nissim, Krzysztof Pietrzak, Manoj Prabhakaran, Adam Smith, Yael Tauman Kalai, Stefano Tessaro, Vinod Vaikuntanathan, Hoeteck Wee, Daniel Wichs, Mary Wootters,  Chaoping Xing, Moti Yung

Online Optimization Post 5: Bregman Projections and Mirror Descent

In this post we return to the generic form of the FTRL online optimization algorithm. If the cost functions are linear, as they will be in all the applications that I plan to talk about, the algorithm is:

\displaystyle x_t := \arg\min_{x\in K} \ R(x) + \sum_{k=1}^{t-1} \langle \ell_k, x \rangle \ \ \ \ \ (1)

where K\subseteq {\mathbb R}^n is the convex set of feasible solutions that the algorithm is allowed to produce, {x \rightarrow \langle \ell_k , x \rangle} is the linear loss function at time {k}, and {R: K \rightarrow {\mathbb R}} is the strictly convex regularizer.

If we have an unconstrained problem, that is, if {K= {\mathbb R}^n}, then the optimization problem (1) has a unique solution: the {x_t} such that

\displaystyle \nabla R(x_t) = - \sum_{k=1}^{t-1} \ell_k

and we can usually both compute {x_t} efficiently in an algorithm and reason about {x_t} effectively in an analysis.

Unfortunately, we are almost always interested in constrained settings, and then it becomes difficult both to compute {x_t} and to reason about it.

A very nice special case happens when the regularizer {R} acts as a barrier function for {K}, that is, the (norm of the) gradient of {R} goes to infinity when one approaches the boundary of {K}. In such a case, it is impossible for the minimum of (1) to occur at the boundary and the solution will be again the unique {x_t} in {K} such that

\displaystyle \nabla R(x_t) = - \sum_{k=1}^{t-1} \ell_k

We swept this point under the rug when we studied FTRL with negative-entropy regularizer in the settings of experts, in which {K = \Delta} is the set of probability distributions. When we proceeded to solve (1) using Lagrange multipliers, we ignored the non-negativity constraints. The reason why it was ok to do so was that the negative-entropy is a barrier function for the non-negative orthant {({\mathbb R}_{\geq 0})^n}.

Another important special case occurs when the regularizer {R(x) = c || x||^2} is a multiple of length-squared. In this case, we saw that we could “decouple” the optimization problem by first solving the unconstrained optimization problem, and then projecting the solution of the unconstrained problem to {K}:

\displaystyle y_{t} = \arg\min_{y\in {\mathbb R}^n} \ c || y||^2 + \sum_{k=1}^{t-1} \langle \ell_k, y \rangle

\displaystyle x_t = \Pi_K (y_t) = \arg\min _{x\in K} || x - y_t ||

Then we have the closed-form solution {y_t = - \frac 1{2c} \sum_{k=1}^{t-1} \ell _k} and, depending on the set {K}, the projection might also have a nice closed-form, as in the case {K= [0,1]^n} that comes up in results related to regularity lemmas.

As we will see today, this approach of solving the unconstrained problem and then projecting on {K} works for every regularizer, for an appropriate notion of projection called the Bregman projection (the projection will depend on the regularizer).

To define the Bregman projection, we will first define the Bregman divergence with respect to the regularizer {R}, which is a non-negative “distance” {D(x,y)} defined on {{\mathbb R}^n} (or possibly a subset of {{\mathbb R}^n} for which the regularizer {R} is a barrier function). Then, the Bregman projection of {y} on {K} is defined as {\arg\min_{x\in K} \ D(x,y)}.

Unfortunately, it is not so easy to reason about Bregman projections either, but the notion of Bregman divergence offers a way to reinterpret the FTRL algorithm from another point of view, called mirror descent. Via this reinterpretation, we will prove the regret bound

\displaystyle {\rm Regret}_T(x) \leq D(x,x_1) + \sum_{t=1}^T D(x_t,y_{t+1})

which carries the intuition that the regret comes from a combination of the “distance” of our initial solution from the offline optimum and of the “stability” of the algorithm, that is, the “distance” between consecutive soltuions. Nicely, the above bound measures both quantities using the same “distance” function.

Continue reading

Online Optimization Post 4: Regularity Lemmas

We now discuss how to view proofs of certain regularity lemmas as applications of the FTRL methodology.

The Szemeredi Regularity Lemma states (in modern language) that every dense graph is well approximate by a graph with a very simple structure, made of the (edge-disjoint) union of a constant number of weighted complete bipartite subgraphs. The notion of approximation is a bit complicated to describe, but it enables the proof of counting lemmas, which show that, for example, the number of triangles in the original graph is well approximated by the (appropriately weighted) number of triangles in the approximating graph.

Analogous regularity lemmas, in which an arbitrary object is approximated by a low-complexity object, have been proved for hypergraphs, for subsets of abelian groups (for applications to additive combinatorics), in an analytic setting (for applications to graph limits) and so on.

The weak regularity lemma of Frieze and Kannan provides, as the name suggests, a weaker kind of approximation than the one promised by Szemeredi’s lemma, but one that is achievable with a graph that has a much smaller number of pieces. If {\epsilon} is the “approximation error” that one is willing to tolerate, Szemeredi’s lemma constructs a graph that is the union of a {2^{2^{\vdots}}} weighted complete bipartite subgraphs where the height of the tower of exponentials is polynomial in {1/\epsilon}. In the Frieze-Kannan construction, that number is cut down to a single exponential {2^{O(1/\epsilon^2)}}. This result too can be generalized to graph limits, subsets of groups, and so on.

With Tulsiani and Vadhan, we proved an abstract version of the Frieze-Kannan lemma (which can be applied to graphs, functions, distributions, etc.) in which the “complexity” of the approximation is {O(1/\epsilon^2)}. In the graph case, the approximating graph is still the union of {2^{O(1/\epsilon^2)}} complete bipartite subgraphs, but it has a more compact representation. One consequence of this result is that for every high-min-entropy distribution {\cal D}, there is an efficiently samplable distribution with the same min-entropy as {\cal D}, that is indistinguishable from {\cal D}. Such a result could be taken to be a proof that what GANs attempt to achieve is possible in principle, except that our result requires an unrealistically high entropy (and we achieve “efficient samplability” and “indistinguishability” only in a weak sense).

All these results are proved with a similar strategy: one starts from a trivial approximator, for example the empty graph, and then repeats the following iteration: if the current approximator achieves the required approximation, then we are done; otherwise take a counterexample, and modify the approximator using the counterexample. Then one shows that:

  • The number of iterations is bounded, by keeping track of an appropriate potential function;
  • The “complexity” of the approximator does not increase too much from iteration to iteration.

Typically, the number of iterations is {O(1/\epsilon^2)}, and the difference between the various results is given by whether at each iteration the “complexity” increases exponentially, or by a multiplicative factor, or by an additive term.

Like in the post on pseudorandom constructions, one can view such constructions as an online game between a “builder” and an “inspector,” except that now the online optimization algorithm will play the role of the builder, and the inspector is the one acting as an adversary. The {O(1/\epsilon^2)} bound on the number of rounds comes from the fact that the online optimization algorithms that we have seen so far achieve amortized error per round {O(1/\sqrt T)} after {T} rounds, so it takes {O(1/\epsilon^2)} rounds for the error bound to go below {\epsilon}.

We will see that the abstract weak regularity lemma of my paper with Tulsiani and Vadhan (and hence the graph weak regularity lemma of Frieze and Kannan) can be immediately deduced from the theory developed in the previous post.

When I was preparing these notes, I was asked by several people if the same can be done for Szemeredi’s lemma. I don’t see a natural way of doing that. For such results, one should maybe use the online optimization techniques as a guide rather than as a black box. In general, iterative arguments (in which one constructs an object through a series of improvements) require the choice of a potential function, and an argument about how much the potential function changes at every step. The power of the FTRL method is that it creates the potential function and a big part of the analysis automatically and, even where it does not work directly, it can serve as an inspiration.

One could imagine a counterfactual history in which people first proved the weak regularity lemma using online optimization out of the box, as we do in this post, and then decided to try and use an L2 potential function and an iterative method to get the Szemeredi lemma, subsequently trying to see what happens if the potential function is entropy, thus discovering Jacob Fox’s major improvement on the “triangle removal lemma,” which involves the construction of an approximator that just approximates the number of triangles.

Continue reading