Italy has a wonderfully named, and well-known within the country, National Academy of Arts and Science, the Accademia dei Lincei, which means something like academy of the “eagle-eyed” (literally, lynx-eyed), that is, people that can see far. The Accademia dei XL is much less well known, although it has a distinguished 240-year history, during which people like Guglielmo Marconi and Enrico Fermi were members. More recently, the much beloved Rita Levi-Montalcini, Holocaust survivor, Nobel Laureate, and Senator-for-life, was a member. Current members include Nobel Laureates Carlo Rubbia and Giorgio Parisi. Noted algebraist Corrado De Concini is the current president.

Be that as it may, the academicians did vote to make me a member, their first computer scientist ever. Next week, at the inauguration of their 240th academic year, I will speak to the other members about randomness and pseudorandomness in computation.

]]>If you would like to come to Italy a few days in advance, Alon Rosen and I are organizing two co-locating workshops on graph algorithms and on cryptography in Milan on June 15-18 (details forthcoming). If you want to stay longer, I am organizing a mini-workshop on fairness in AI in Milan on June 27 (more details about it in a few days). Registration will be free for both events. There are several high-speed trains every day between Rome and Milan, taking about 3 hours.

**Call for Participation **

**54th ACM Symposium on Theory of Computing (STOC 2022) – Theory Fest **

**June 20-24, 2022 **

**Rome, Italy **

The 54th ACM Symposium on Theory of Computing (STOC 2022) is sponsored by the ACM Special Interest Group on Algorithms and Computation Theory and will be held in Rome, Italy, Monday June 20 – Friday, June 24, 2022.

STOC 2022 – Theory Fest will feature technical talk sessions, 6 workshops with introductory tutorials, poster sessions, social events, and a special joint session with “Accademia Nazionale dei Lincei”, the oldest and most prestigious Italian academic institution, followed by a reception and a concert at the Academy historic site.

**Registration**

STOC 2022 registration is available here.

**Early registration deadline: April 30th. **

STOC 2022 is sponsored by Algorand, Amazon, Apple, Google, IOHK, Microsoft, Sapienza University of Rome.

]]>The new Sapienza computer science department was founded mostly by faculty from the Sapienza mathematics department, plus a number of people that came from other places to help start it. Among the latter, Renato Capocelli had moved to Rome from the University of Salerno, where he had been department chair of computer science.

Capocelli worked on combinatorics and information theory. In the early 90s, he had also become interested in the then-new area of zero-knowledge proofs.

Capocelli taught the information-theory course that I was attending, and it was a very different experience from the classes I had attended up to that point. To get the new major started, several professors were teaching classes outside their area, sticking close to their notes. Those teaching mathematical classes, were experts but were not deviating from the definition-theorem-proof script. Capocelli had an infectious passion for his subject, took his time to make us gain an intuitive understanding of the concepts of information theory, was full of examples and anecdotes, and always emphasized the high-level idea of the proofs.

I subsequently met several other charismatic and inspiring computer scientists and mathematicians, though Capocelli had a very different personality from most of them. He was like an earlier generation of Southern Italian intellectuals, who could be passionate about their subject in a peculiarly non-nerdy way, loving it the way one may love food, people, nature, or a full life in general.

On April 8, 1992, Renato Capocelli died suddenly and unexpectedly, though his memory lives on in the many people he inspired. The Computer Science department of the University of Salerno was named after him for a period of time.

]]>A few weeks ago, we were joined by Francesca Buffa and Marc Mezard.

Francesca, a computational biologist formerly at Oxford medical school, is now the fourth out of four computer science tenured faculty in our new department to have an active ERC grant.

Marc’s work has spanned theoretical physics, information theory and computation, including his collaboration with Giorgio Parisi’s Nobel Prize winning work, and he has been most recently the president of the Ecole National Superieure in Paris. When we asked for letters for his tenure case, one of the reviewers wrote, more or less in so many words, “you would be lucky to have Marc in your university, though it’s very unlikely that he will accept your offer”. At that point Marc had already accepted.

]]>新年快乐！

]]>Some details are here. Candidates must apply online by January 15 (end of day Central Europe time) for the application to be considered. To apply online, go to https://jobmarket.unibocconi.eu/ and look at the only opening that has a Jan 15 expiration (currently it is at the top of the list). The negotiable start date is September, 2022. By that time the new Computing Sciences department will be fully operational.

We are interested in all areas of computer science. Alon Rosen, Dirk Hovy and I are very happy to talk to prospective candidates about what the university is like and what are its plans for developing computer science.

The university pays internationally competitive salary and provides relocation assistance. The language of instructions of all computer science courses at both undergraduate and graduate levels is English.

Scholars of any nationality who have not lived in Italy for the past two years and who move to Italy to take a university tenure-track or tenured position pay almost no income tax for six years, or more if they buy a home in Italy and/or have children under the age of 18.

For Italians working in Italy: this position is governed by a private-law contract with Bocconi, which is not the same as a RDTA or RDTB position, although the terms are similar.

Subject to a successful mid-term review (which usually happens after three years), and a successful tenure review (which happens within five years from the mid-term review, or possibly earlier depending on the background of the candidate), assistant professors are promoted to associate professors with tenure. (For those familiar with the Italian system, the latter positions are fully recognized as professore associato by the ministry of university.)

]]>**1. Matrix Multiplicative Weights Update **

In this post we consider the following generalization, introduced and studied by Arora and Kale, of the “learning from expert advice” setting and the multiplicative weights update method. In the “experts” model, we have a repeated game in which, at each time step , we have the option of following the advice of one of experts; if we follow the advice of expert at time , we incur a loss of , which is unknown to us (although, at time we know the loss functions ). We are allowed to choose a probabilistic strategy, whereby we follow the advice of expert with probability , so that our expected loss at time is .

In the matrix version, instead of choosing an expert we are allowed to choose a unit -dimensional vector , and the loss incurred in choosing the vector is , where is an unknown symmetric matrix. We are also allowed to choose a probabilistic strategy, so that with probability we choose the unit vector , and we incur the expected loss

The above expression can also be written as

where and we used the Frobenius inner product among square matrices defined as . The matrices that can be obtained as convex combinations of rank-1 matrices of the form where is a unit vector are called *density matrices* and can be characterized as the set of positive semidefinite matrices whose trace is 1.

It is possible to see the above game as the “quantum version” of the experts settings. A choice of a unit vector is a *pure quantum state*, a probability distribution of pure quantum states, described by a density matrix, is a *mixed quantum state*. If is a density matrix describing a mixed quantum state, is a symmetric matrix, and is the spectral decomposition of in terms of its eigenvalues and orthonormal eigenvectors , then is the expected outcome of a measurement of in the basis , and such that is the value of the measurement if the outcome is .

If you have no idea what the above paragraph means, that is perfectly ok because this view will not be particularly helpful in motivating the algorithm and analysis that we will describe. (Here I am reminded of the joke about the way people from Naples give directions: “How do I get to the post office?”, “Well, you see that road over there? After the a couple of blocks there is a pharmacy, where my uncle used to work, though now he is retired.” “Ok?” “Now, if you turn left after the pharmacy, after a while you get to a square with a big fountain and the church of St. Anthony where my niece got married. It was a beautiful ceremony, but the food at the reception was not great.” “Yes, I know that square”, “Good, don’t go there, the post office is not that way. Now, if you instead take that other road over there …”)

The main point of the above game, and of the Matrix Multiplicative Weights Update (MMWU) algorithm that plays it with bounded regret, is that it provides useful generalizations of the standard “experts” game and of the Multiplicative Weights Update (MWU) algorithm. For example, as we have already seen, MWU can provide a “derandomization” of the Chernoff bound; we will see that MMWU provides a derandomization of the *matrix* Chernoff bound. MWU can be used to approximate certain Linear Programming problems; MMWU can be used to approximate certain *Semidefinite Programming* problems.

To define and analyze the MMWU algorithm, we need to introduce certain operations on matrices. We will always work with real-valued symmetric matrices, but everything generalizes to complex-valued Hermitian matrices. If is a symmetric matrix, are the eigenvalues of , and are corresponding orthonormal eigenvectors, then we will define a number of operations and functions on that operate on the eigenvalues while leaving the eigenvectors unchanged.

The first operation is *matrix exponentiation*: we define

The operation always defines a positive definite matrix, and the resulting matrix satisfies a “Taylor expansion”

Indeed, it is more common to use the above expansion as the definition of the matrix exponential, and then derive the expression in terms of eigenvalues.

We also have the useful bounds

which is true for every and

which is true for all such that .

Analogously, if is positive definite, we can define

and we have a number of identities like , , , where is a scalar. We should be careful, however, not to take the analogy with real numbers too far: for example, if and are two symmetric matrices, in general it is not trues that , in fact the above expression is actually always false except when and commute, in which case it is trivially true. We have, however, the following extremely useful fact.

Theorem 1 (Golden-Thompson Inequality)

The Golden-Thompson inequality will be all we need to generalize to this matrix setting everything we have proved about multiplicative weights. See this post by Terry Tao for a proof.

The *Von Neumann entropy* of a density matrix with eigenvalues is defined as

that is, if we view as the mixed quantum state in which the pure state has probability , then is the entropy of the distribution over the pure states. Again, this is not a particularly helpful point of view, and in fact we will be interested in defining not just for density matrices but for arbitrary positive definite matrices, and even positive semidefinite (with the convention that , which is used also in the standard definition of entropy of a distribution).

We will be interested in using Von Neumann entropy as a regularizer, and hence we will want to know what is its Bregman divergence. Some calculations show that the Bregman divergence of the Von Neumann entropy, which is called the quantum relative entropy, is

If and are density matrices, the terms cancel out; the above definition is valid for arbitrary positive definite matrices.

We will have to study the minima of various functions that take a matrix as an input, so it is good to understand how to compute the gradient of such functions. For example what is the gradient of the function ? Working through the definition we see that , and indeed we always have that the gradient of the function is everywhere. Somewhat less obvious is the calculation of the gradient of the Von Neumann entropy, which is

**2. Analysis in the Constrained FTRL Framework **

Suppose that we play that we described above using agile mirror descent and using negative Von Neumann entropy (appropriately scaled) as a regularizer. That is, for some that we will choose later, we use the regularizer

which has the Bregman divergence

and our feasible set is the set of density matrices

To bound the regret, we just have to plug the above definitions into the machinery that we developed in our fifth post.

At time 1, we play the identity matrix scaled by n, which is a density matrix of maximum Von Neumann entropy :

At time , we play the matrix obtained as

and recall that we proved that, after steps,

If is a density matrix with eigenvalues , then the first term is

To complete the analysis we have to understand . We need to compute the gradient and set it to zero. The gradient of is just . The gradient of is

Meaning that we want to solve for

and satisfies

and we can write

Then we can use Golden-Thompson and the fact that , which holds if , to write

Combining everything together we have

and so, provided ,

This is the best bound we can hope for, and it matches Theorem 1 in our first post about the Xultiplicative Weights Update algorithm.

If we have , we can simplify it to

where the last step comes from optimizing .

We can also write, under the condition ,

where is the “absolute value” of the matrix defined in the following way: if is a symmetric matrix, then its absolute value is . Allen-Zhu, Liao and Orecchia state the analysis in this way in their on generalizations of Matrix Multiplicative Weights.

Our next post will discuss applications at length, but for now let us gain a bit of intuition about the usefulness of these regret bounds. Recall that, for every symmetric matrix , we have

and so the regret bound can be reintepreted in the following way: if we let be the loss functions used in a game played against a MMWU algorithm, and the algorithm selects density matrices , then

that is,

provided that . For example, switching with , we have

provided that , which means that if we can choose a sequence of loss matrices that make the MMWU have small loss at each step, then we are guaranteed that the sum of such matrices cannot have any large eigenvalue.

]]>The negotiable start date is September 1st, 2022. Each position is for one year, renewable for a second. The positions offer an internationally competitive salary (up to 65,000 Euro per year, tax-free, plus relocation assistance and travel allowance), in a wonderful location that, at long last, is back to more or less normal life. The application deadline is **December 17, 2021**.

Among the topics that I am interested in are spectral graph theory, average-case complexity, “applications” of semidefinite programming, random processes on networks, approximation algorithms, pseudorandomness and combinatorial constructions.

Bocconi Computer Science is building up a theory group: besides me, we have Alon Rosen, Marek Elias, a tenured person that will join next Fall, and more hires are on the horizon. Now that traveling is ok again, and considering that Alon and I both have ERC grants, we should expect a big stream of theory visitors coming and going through Bocconi from week-long visits to semester or year long sabbaticals.

To apply, go to https://www.unibocconi.eu/faculty-postdoc and look for the position advertised as “BIDSA Informatics”, which looks like this:

and click on “apply online”. Currently it is the second position from the top in the list

]]>

**1. The Impagliazzo Hard-Core Lemma **

The Impagliazzo Hard-Core Lemma is a striking result in the theory of average-case complexity. Roughly speaking, it says that if is a function that is “weakly” hard on average for a class of “efficiently computable” functions , that is, if, for some , we have that

then there is a subset of cardinality such that is “strongly” hard-on-average on , meaning that

for a small . Thus, the reason why functions from make a mistake in predicting at least a fraction of the times is that there is a “hard-core” set of inputs such that every function from makes a mistake about 1/2 of the times for the fraction of inputs coming from .

The result is actually not literally true as stated above, and it is useful to understand a counterexample, in order to motivate the correct statement. Suppose that contains just functions, and that each function differs from in exactly a fraction of inputs from , and that the set of mistakes are *disjoint*. Thus, for every set , no matter its size, there is a function that agrees with on at least a fraction of inputs from . The reason is that the sets of inputs on which the functions of differ from form a partition of , and so their intersections with form a partition of . By an averaging argument, one of those intersections must then contain at most elements of .

In the above example, however, if we choose any three distinct functions from , we have

So, although is weakly hard on average with respect to , we have that is not even worst-case hard for a slight extension of in which we allow functions obtained by simple compositions of a small number of functions of .

Theorem 1 (Impagliazzo Hard-Core Lemma)Let be a collection of functions , let a function, and let and be positive reals. Then at least one of the following conditions is true:

- ( is not weakly hard-on-average over with respect to a slight extension of ) There is a , an integer , and functions , such that
satisfies

- ( is strongly hard-on-average over a set of density ) There is a set such that and

Where is equal to or depending on whether the boolean expression is true or false (the letter “” stands for “indicator” function of the truth of the expression).

**2. Proving the Lemma **

Impagliazzo’s proof had polynomial in both and , and an alternative proof discovered by Nisan has a stronger bound on of the order of . The proofs of Impagliazzo and Nisan did not immediately give a set of size (the set had size ), although this could be achieved by iterating their argument. An idea of Holenstein allows to prove the above statement in a more direct way.

Today we will see how to obtain the Impagliazzo Hard-Core Lemma from online optimization, as done by Barak, Hardt and Kale. Their proof achieves all the parameters claimed above, once combined with Holenstein’s ideas.

We say that a distribution (here “” stands for probability *measure*; we use this letter since we have already used last time to denote the Bregman divergence) has min-entropy at least if, for every , . In other words, the min-entropy of a distribution over a sample space is defined as

The uniform distribution over a set has min-entropy , and all distributions of min-entropy can be realized as a convex combination of distributions that are each uniform over a set of size , thus uniform distributions over large sets and large-min-entropy distributions are closely-related concepts. We will prove the following version of the hard-core lemma:

Theorem 2 (Impagliazzo Hard-Core Lemma — Min-Entropy Version)Let be a finite set, be a collection of functions , let a function, and let and be positive reals. Then at least one of the following conditions is true:

- ( is not weakly hard-on-average over with respect to ) There is a , an integer , and functions , such that
satisfies

- ( is strongly hard-on-average on a distribution of min-entropy ) There is a distribution of min-entropy such that

Under minimal assumptions on (that it contains functions), the min-entropy version implies the set version, and the min-entropy version can be used as-is to derive most of the interesting consequences of the set version.

Let us restate it one more time.

Theorem 3 (Impagliazzo Hard-Core Lemma — Min-Entropy Version)Let be a finite set, be a collection of functions , let a function, and let and be positive reals. Suppose that for every distribution of min-entropy we haveThen there is a , an integer , and functions , such that

satisfies

As in previous posts, we are going to think about a game between a “builder” that works toward the construction of and an “inspector” that looks for defects in the construction. More specifically, at every round , the inspector is going to pick a distribution of min-entropy and the builder is going to pick a function . The loss function, that the inspector wants to minimize, is

The inspector runs the agile online mirror descent algorithm with the constraint of picking distributions of the required min-entropy, and using the entropy regularizer; the builder always chooses a function such that that

which is always a possible choice given the assumptions of our theorem.

Just by plugging the above setting into the analysis from the previous post, we get that if we play this online game for steps, the builder picks functions such that, *for every distribution* of min-entropy , we have

We will prove that (1) holds in the next section, but we emphasize again that it is just a matter of mechanically using the analysis from the previous post. Impagliazzo’s proof relies, basically, on playing the game using lazy mirror descent with regularization, and he obtains a guarantee like the one above after steps.

What do we do with (1)? Impagliazzo’s original reasoning was to define

and to consider the set of “bad” inputs such that . We have

and so

The min-entropy of the uniform distribution over is , and this needs to be less than , so we conclude that happens for at most a fraction of elements of .

This is qualitatively what we promised, but it is off by a factor of 2 from what we stated above. The factor of 2 comes from a subsequent idea of Holenstein. In Holenstein’s analysis, we sort elements of according to

and he lets be the set of elements of for which the above quantity is smallest, and he shows that if we properly pick an integer and define

then will be equal to for all and also for at least half the , meaning that for at least a fraction of the input. Since this is a bit outside the scope of this series of posts, we will not give an exposition of Holenstein’s argument.

**3. Analysis of the Online Game **

It remains to show that we can achieve (1) with of the order of . As we said, we play a game in which, at every step

- The “inspector” player picks a distribution of min-entropy at least , that is, it picks a number for each such that .
- The “builder” player picks a function , whose existence is guaranteed by the assumption of the theorem, such that
and defines the loss function

- The “inspector” is charged the loss .

We analyze what happens if the inspector plays the strategy defined by agile mirror descent with negative entropy regularizer. Namely, we define the regularizer

for a choice of that we will fix later. The corresponding Bregman divergence is

and we work over the space of distributions of min-entropy .

The agile online mirror descent algorithm is

so that is the uniform distribution, and for

Solving the first step of agile online mirror descent, we have

Using the analysis from the previous post, for every distribution in , and every number of steps, we have the regret bound

and we can bound

and

where, in the last step, we used the fact the quantity in parenthesis is either 0 or which is , and that because is a distribution.

Overall, the regret is bounded by

where the last inequality comes from an optimized choice of .

Recall that we choose the functions so that for every , so for every

and by choosing of the order of we get

It remains to observe that

so we have that for every distribution of min-entropy at least it holds that

which is the statement that we promised and from which the Impagliazzo Hard-Core Lemma follows.

**4. Some Final Remarks **

After Impagliazzo circulated a preliminary version of his paper, Nisan had the following idea: consider the game that we define above, in which a builder picks an , an inspector picks a distribution of the prescribed min-entropy, and the loss for the inspector is given by . We can think of it as a zero-sum game if we also assign a gain to the builder.

If the builder plays second, there is a strategy that guarantees a gain that is at least , and so there must be a mixed strategy, that is, a distribution over functions in , that guarantees such a gain even if the builder plays first. In other words, for all distributions of the prescribed min-entropy we have

Nisan then observes that we can sample functions and have, with high probability

and the sampling bound on can be improved to order of with the same conclusion.

Basically, what we have been doing today is to come up with an algorithm that finds an approximate solution for the LP that defines the optimal mixed strategy for the game, and to design the algorithm is such a way that the solution is very sparse.

This is a common feature of other applications of online optimization techniques to find “sparse approximations”: one sets up an optimization problem whose objective function measures the “approximation error” of a given solution. The object we want to approximate is the optimum of the optimization problem, and we use variants of mirror descent to prove the existence of a sparse solution that is a good approximation.

]]>