Karlin, Klein, and Oveis Gharan have just posted a paper in which, at long last, they improve over the 1.5 approximation ratio for metric TSP which was achieved, in 1974, by Christofides. For a long time, it was suspected that the Held-Karp relaxation of metric TSP had an approximation ratio better than 1.5, but there was no viable approach to prove such a result. In 2011, two different approaches were developed to improve 1.5 in the case of shortest-path metrics on unweighted graphs: one by Oveis Gharan, Saberi and Singh and one by Momke and Svensson. The algorithm of Karlin, Klein and Oveis Gharan (which does not establish that the Held-Karp relaxation has an integrality gap better than 1.5) takes as a starting point ideas from the work of Oveis Gharan, Saberi and Singh.

Yesterday, Bloom and Sisask posted a paper in which they show that there is a constant such that, for every sufficiently large , if has cardinality at least , then contains a non-trivial length-3 arithmetic progression. Without context, this may seem like a strange result to get excited about, but it sits at the nexus of a number of fundamental results and open questions in combinatorics. Gil Kalai has written an excellent post telling the story of this problem, so instead of writing a worse version of it I will refer the reader to Gil’s blog.

Back to bad news, the day after Harvard announced that it would deliver courses online only in 2020-21, the Trump administration announced that it would void student visas of students who are not attending in-person classes in 2020-21. Back to good news, Harvard and MIT announced that they will sue the federal government over this, and other universities, including the University of California system, are planning similar responses. Apart from the action, I was really heartened to read MIT’s President statement on the matter (thanks to Vinod Vaikuntanathan for bringing it my attention) which is worth reproducing:

To the members of the MIT community,

On Monday, in a surprising development, a division of Immigration and Customs Enforcement announced that it will not permit international students on F-1 visas to take a full online course load this fall while studying in the United States. As I wrote yesterday, this ruling has potentially serious implications for MIT’s international students and those enrolled at institutions across the country.

This morning, in response, MIT and Harvard jointly filed suit against ICE and the US Department of Homeland Security in federal court in Massachusetts. In the lawsuit, we ask the court to prevent ICE and DHS from enforcing the new guidance and to declare it unlawful.

The announcement disrupts our international students’ lives and jeopardizes their academic and research pursuits. ICE is unable to offer the most basic answers about how its policy will be interpreted or implemented. And the guidance comes after many US colleges and universities either released or are readying their final decisions for the fall – decisions designed to advance their educational mission and protect the health and safety of their communities.

Our international students now have many questions – about their visas, their health, their families and their ability to continue working toward an MIT degree. Unspoken, but unmistakable, is one more question: Am I welcome?

At MIT, the answer, unequivocally, is yes.

MIT’s strength is its people – no matter where they come from. I know firsthand the anxiety of arriving in this country as a student, excited to advance my education, but separated from my family by thousands of miles. I also know that welcoming the world’s brightest, most talented and motivated students is an essential American strength.

While we pursue legal protections for our international students, we will continue to stay in close touch with them through email and updates on the International Students Office’s website. If you have questions, you may write to the ISO at iso-help@mit.edu.

Sincerely,

L. Rafael Reif

This way of talking like a human being, and like you actually care about the matter at hand, is a big contrast with the robotic statements that usually come out of campus leadership. The corresponding message from UC Berkeley’s Chancellor is the way such statements usually are like:

Dear campus community,

Yesterday, the Department of Homeland Security issued new guidance to universities related to international students and fall instruction requirements. The guidance is deeply concerning: it could potentially force the return of many international students to their home countries if they are unable to find the appropriate balance of in-person and remote classes. These requirements run counter to our values of being an inclusive community and one that has a long tradition of welcoming international students from around the globe. International students enrich campus life immeasurably, through their participation in classes, research collaborations and extracurricular activities.

We will explore all of our options, legal and otherwise, to counter the deleterious effects of these policies that imp act the ability for international students to achieve their academic goals. It is not only important for UC Berkeley but for all of higher education across the U.S. to take every step possible to mitigate these policies that send a message of exclusion to our international community of scholars. We will partner with our professional associations to advocate for sound legislation that continues to support international educational exchange.

More immediately, we are working with colleagues across our campus to identify a path that will allow us to comply with these requirements while ensuring a healthy learning environment, and paying attention to the needs of our international students. We recognize the concern and anxiety these new rules have created, and we are moving quickly to ensure that we offer the proper balance of online and in-person classes so that our students can remain in the U.S. and satisfy their visa requirements, and that those students residing outside the U.S. can maintain their enrollment status.

We expect to announce more details soon. Should you have any questions, please contact the Berkeley International Office at internationaloffice@berkeley.edu.

Sincerely,

Carol Christ

ChancellorLisa Alvarez-Cohen

Vice Provost for Academic Planning and Senior International Officer

It is interesting to think about where this difference in tone is coming from. Carol Christ is a renown humanities scholar who, I suppose, writes well. She comes across as charismatic and caring, and she is definitely straight-talking in person. Probably, as for everything else, Berkeley has a byzantine process to create announcements and press releases, and if Stephen Colbert was the Chancellor of UC Berkeley, after a couple of weeks on the job he would sound just as *deeply concerned* and just as into *exploring all options*, while meanwhile *working to identify a path* and *paying attention* about something that is totally fucked up and needs action *today*.

Which brings me to all the statements in support of Black Lives Matter that have been coming out of every scholarly institution in the last few days. While their messages are generally unobjectionable, there is a certain sameness to their form (“we say their names…”, “we will do the work…”, “we see you…”) and they don’t sound at all like the way the people putting them out speak. This has complicated causes, including the fact that many such statements came out of letter-writing campaigns that demanded statements in a very specific way, without leaving a lot of room for individual expression. The association of American Poets, for example, put out a statement of solidarity with the Black community; in response, a letter with 1800 signatories claimed that it was too weak a statement and that it was, in fact, itself an act of violence against Black people; several resignations followed. The Board of the National Book Critics Circle was working on such a statement, and the work devolved into acrimony and several rounds of “I am outraged and I resign,” “no *I* am outraged at your outrage and *I* resign, “well then *I* am outraged that you are outraged at her outrage” until almost the whole board was gone in a “sequence of events [that] was bizarre and bloody in an end-of-a-Tarantino-movie way.”

Also, people in America talk about race the way UC Berkeley administrators talk about anything, that is extremely carefully and vacuously. But, back to the statements about foreign students, the difference between the administrative cultures at MIT and Berkeley is not the only difference between the statements of Reif and Christ: clearly a big difference is that Reif is an immigrant himself. When Trayvon Martin was killed, Obama talked about the killing in a way that was very different, and much more meaningful, than other politicians: if I had a son, Obama said, he would look a lot like Trayvon. If there were more people of color in positions of academic leadership, I think that we would have seen an academic response to Black Lives Matter that would have been less fearful, dogmatic and robotic and more meaningful and productive. Or perhaps we would have all ended up like the National Book Critic Circle, it’s hard to say.

]]>program managers at funding agencies to advocate for theory funding. The event was quite productive and successful.

A second such workshop is going to be held, online, in the third week of July. Applications to participate are due on June 15, a week from today. Organizers expect that participants will have to devote about four hours of their time to the workshop, and those who volunteer to be team leads will have a time commitment of about ten hours.

More information at this link

]]>

**1. Hypergraph Sparsifiers **

An (undirected) hypergraph is a collection of subsets , called hyperedges. A graph is the special case in which each set has cardinality 2. For simplicity we will talk only about -uniform hypergraphs, that is hypergraphs in which all hyperedges have the same cardinality (the arguments below would work also in the non-uniform case in which all hyperedges have cardinality *at most* ). Hyperedges may have weights.

If is a subset of vertices, a hyperedge is *cut* by if has non-empty intersection with both and . We call the number of hyperedges of cut by , or the total weight of such hyperedges if is a weighted hypergraph.

We can then generalize the notion of Benczur-Karger sparsification, and say that is a cut sparsifier of with error if and have the same vertex set and

Kogan and Krauthgamer show that every -uniform hypergraph admits a cut sparsifier of error with only weighted hyperedges. They are able to extend the argument of Benczur and Karger to hypergraphs, including arguing that there are few sparse cuts. They assign a probability to each hyperedge, sample it with that probability, weighing it if selected, and then use a union bound and Chernoff bounds to argue that all cuts are preserved.

Anand Louis has introduced a notion of hypergraph Laplacian in the following way. The Laplacian quadratic form of a hypergraph is a function defined as

where is the weight of the hyperedge , or 1 if the hypergraph is unweighted.

This definition has the motivation that it recovers the Laplacian quadratic form in the case of graphs, and that if is the indicator of a set then . Furthermore, one can define “eigenvalues” and “eigenvectors” of this hypergraph Laplacian and recover a Cheeger inequality and even higher-order Cheeger inequalities.

So it seems interesting to consider the following notion of sparsification: a hypergraph is a spectral sparsifier of with error if and have the same vertex set and

where, as before, the convention is that .

Soma and Yoshida studied this question and gave a construction with hyperedges. In the rest of this post we will discuss the construction by Nikhil Bansal, Ola Svensson and me, which uses hyperedges.

**2. Choosing Probabilities **

Given a hypergraph , we can construct a multigraph by taking each hyperedge of and then constructing, in , a clique between the vertices of . Thus, in , the edge is repeated as many times (possibly, zero times) as the number of hyperedges of that contain both and . Another way to think about it is that the Laplacian of is given by

where the inner sum is over the unordered pairs . This graph relates in several interesting ways with . For example, if is a subset of vertices and is a hyperedge that is cut by , then between and of the edges of derived from are cut by . If is not cut by , then none of the edges derived from is cut by , so we have

and, with a bit more work, it is possible to prove similar bounds for the Laplacian quadratic forms

This means that if we sparsify , say with the Spielman-Srivastava construction, we obtain information about , up to multiplicative error . Now suppose that, as we sample edges of to sparsify it in the Spielman-Srivastava way, we will also pick hyperedges of (for example we pick if at least one of its corresponding edges in is picked), and we weigh them by the inverse of the probability of being selected. Then we may hope that if the Spielman-Srivastava sparsification of is tuned to achieve error , the hypergraph that we obtain will have error at most . Indeed, this is roughly what happens, and we will be able to prove it by showing that the error in the hypergraph sparsification is dominated by the error in the “Gaussian version” of Spielman-Srivastava described in the previous post.

So we are going to assign to each hyperedge a probability

where the factor will be chosen later to get the construction to work and is the effective resistance of the edge of .

In fact, as before, it will be helpful to have probabilities that are non-positive powers of two, so we will choose to be a power of two such that

We have

Another fact that will become useful later (it will save us a factor of the order of in the number of hyperedges in the construction) is that

**3. A Discrete Random Process **

Our construction of a hypergraph sparsifier will be to select each independently with probability and, if selected, weigh it by . Our goal is to find an upper bound in probability, or even in expectation, on

Recalling that , we will study

Because if we can show that the above quantity is at most , then, for every ,

as desired.

If we define

then we are interested in the supremum in of

where is a random variable that is equal to with probability and is equal to with probability .

As before, we will do the construction in rounds. If the smallest probability is , then we will have rounds. We start with and then, at round , we take all hyperedges such that and, independently for each such hyperedge, we either delete it or we double its weight (with probability 1/2 in each case). The final hypergraph is distributed like . We have the processes

where the random variables are Rademacher, the weights are either 0 or , and is the set of edges that are “active” in round , that is, the set of hyperedges such that .

For each hypergraph , we will also consider its associated graph , obtained by replacing each hyperedge with a clique. The Laplacian of is

We have the following lemma

Lemma 1For every outcome of the first rounds such that , there is a probability at least over the randomness of the -th round thatand that

This means that we can take and apply the above lemma inductively to argue that we have a high probability that , and so we get a hypergraph spectral sparsifier with error and hyperedges.

To prove the Lemma we will define the Gaussian process

where the are Gaussian. Notice the two differences between and : we replaced the Rademacher choices with Gaussian choices, and we replaced

with

Furthermore, we are doing a random choice for each pair within each hyperedge instead of just one random choice per hyperedge.

Fact 2The random processes and are -dominated by .

*Proof:* For every we have

and

To complete the proof, we argue that

To verify the above inequality, assume , otherwise the argument will be symmetric, and call the maximizer for and the maximizer for . Then

It remains to estimate the expected sup

and the diameter

where, recall, .

With the usual change of variable we have

where, as before, we use the notation , and .

By matrix Chernoff bounds for Gaussian sums of matrices,

Recall that if is a hyperedge that is active at round , then and , and we also have

so that we have

The last term of (1) is the spectral norm of , which is at most 2 by the assumption of the Lemma.

Collecting all the pieces, we have proved that

We also need to bound the diameter of , that is

under the usual change of basis, for every of length 1 we want to bound the square root of

where is either zero or and we can continue our chain of inequalities with

where we used again the assumption of the Lemma .

To prove the last part of the lemma, we have to prove that with high probability

We see that

as noted before, each is either zero or , so

and the final claim follows from matrix Chernoff bounds.

Now we can put everything together. Applying our lemma inductively, we can say that with high probability

and

provided that we choose at least at absolute constant times .

In particular, we can choose to be an absolute constant times and have

which is the same as

So, with high probability, is a spectral sparsifier of with error , and it has hyperedges

**4. Some Additional Thoughts **

The dependence on is definitely not optimal, particularly because when is order of we know from the work of Soma and Yoshida that we can do better. One place where we seem to lose is that, although we know

we are only able to show that

is -dominated by

rather than, as we could have hoped, -dominated. The difficulty is that the bound

is sometimes tight. In order to do better, it seems necessary to do something a bit differently from what we do here.

The effective resistance of an edge in a graph can be written as

with the convention that . So it seems reasonable that a good definition of effective resistance for an hyperedge in a hypergraph would be

One can argue that these “effective resistances” add up to , but perhaps they add up to ? If we sample according to those “effective resitances,” can we apply generic chaining directly to without having to rely on a Gaussian process on matrices, for which we have Chernoff bounds?

]]>

In the previous post we talked about Gaussian and sub-Gaussian processes and generic chaining.

In this post we talk about the Spielman-Srivastava probabilistic construction of graph sparsifiers. Their analysis requires a bound on the largest eigenvalue of a certain random matrix, that can be derived from matrix Chernoff bounds.

We will then make our life harder and we will also derive an analysis of the Spielman-Srivastava construction by casting the largest eigenvalue of that random matrix as the sup of a sub-Gaussian process, and then we will apply the machinery from the previous post.

This will be more complicated than it needs to be, but the payoff will be that, as will be shown in the next post, this more complicated proof will also apply, with some changes, to the setting of hypergraphs.

**1. Graph Sparsifiers **

For a graph having vertex set , if is a subset of vertices, we denote by the number of edges that cross the cut , that is, that have one endpoint in and one endpoint outside . If is a weighted graph, is the sum of the weights of such edges.

A graph is a *cut sparsifier* of with error at most if and have the same vertex set and, for every , we have

that is, if all cuts in and are the same, up to a multiplicative error . (We use the convention that .) This definition was introduced by Benczur and Karger who showed that every graph has a cut sparsifier with at most edges, where is the number of vertices, and that such a sparsifier can be constructed by an efficient probabilistic algorithm.

This is a useful construction because if one wants to run on an algorithm that approximately solves a problem that depends on the cut structure of (for example min cut, min st-cut, max flow, or an algorithm for a clustering problem), then one may alternatively run such algorithm on , and an approximate solution for will also be an approximate solution for . The advantage is that one runs the algorithm on a graph that has fewer edges, and so one needs less time and less memory to run the algorithm.

The approach of Benczur and Karger is to assign to each edge of a probability , in a certain careful way that we will not describe. Then they generate by doing the following independently for every edge : with probability we put the edge in , and give it weight , and with probability we do not put the edge in . For every cut , we have that is a random variable with expectation , and Benczur and Karger are able to use Chernoff bounds and a union bound to show that there is a setting of those probabilities such that it is likely that all cuts will be preserved with multiplicative error at and such that . The union bound has to be done very carefully and, in particular, one has to use the fact that there can be few sparse cuts.

Spielman and Teng introduced the stronger definition of *spectral sparsifier*: according to their definition, a graph is a spectral sparsifier of with error at most if, for every vector we have

where is the Laplacian matrix of and, as before, we take the convention that . This is a stronger condition because, if is the indicator vector of a set , then , so the Benczur-Karger condition is equivalent to the special case of the Spielman-Teng condition in which we only quantify over Boolean vectors .

Spielman and Teng gave an efficient construction of spectral sparsifiers with edges.

Later, Spielman and Srivastava reduced the number of edges to , with a proof similar to that of Benczur and Karger’s: they attribute a probability to each edge, and then sample each edge with probability , weighing it if selected. The Laplacian of the resulting weighted graph satisfies , and so one has to study the concentration of the random matrix around its average, which is doable using matrix Chernoff bounds.

More recently, Batson, Spielman and Srivastava gave a construction of spectral sparsifiers with edges. Their construction is deterministic, and it proceeds by choosing one edge at a time, in a process that is driven by a certain potential function. Allen-Zhu, Liao and Orecchia have presented an interpretation of Batson-Spielman-Srivastava as an online optimization game played using a Follow-the-Regularized-Leader strategy. As my sequence of posts on online optimization will continue after the current hiatus, I plan to present the results of Allen-Zhu, Liao and Orecchia.

**2. The Spielman-Srivastava Construction **

We will assume that the graph is connected, otherwise we can apply the construction below to each connected component.

In the Spielman-Srivastava construction, we assign a probability to each edge, and then we want to say that the event

holds with high probability, where is the random graph obtained by sampling each edge of independently with probability , and weighing it if selected, and is the Laplacian of .

Actually, Spielman and Srivastava sample from a slightly different distribution. Namely they repeatedly sample from a distribution on all edges, in which edge has probability proportional to , because this distribution is easier to analyze with matrix Chernoff bounds, but we will proceed to analyze the distribution described in the above paragraph.

Matrix Chernoff bounds give upper bounds to the probability that a sum of independent random matrices deviates in operator norm from its average, and take the form of

where the are independent random matrices. If the matrices are Hermitian, the most general case is given by the *matrix Bernstein* bound

is the “variance” of , and is an upper bound such that

holds with probability one.

It remains to reformulate (1) in a form like (2). We first note that

where is the Laplacian matrix of the edge . That is, if , then is the rank-1 matrix whose quadratic form is .

So we can write

where is a random variable that is equal to with probability and to with probability .

We also note that all the terms in (1) are invariant under shifts in , so we can rewrite the event (1) as

Where is the set of vectors that are orthogonal to . If we apply the change of variable , which is bijective on , the previous event becomes

which is implied by (and actually equivalent to):

Looking at the matrix Bernstein bound, we want to choose the so that, say,

The variance term (7) is the spectral norm of

where we computed and we used the fact that if is a rank-1 real-valued symmetric matrix then .

So we have

If we set

then we see that we satisfy both (7) and (8).

The term is the *effective resistance* of the edge and it is usually denoted as . It is known that for connected graphs. Thus we have

as promised.

**3. Analysing the Construction as a Sub-Gaussian Process **

Now we would like to provide an analysis of the Spielman-Srivastava construction in terms of bounding the sup of a sub-Gaussian process via the Talagrand comparison inequality. Indeed we can think of our goal as showing that the following two random variables are both with high probability:

where the are the random variables defined in the previous section. Unfortunately, while random processes that involve weighted sums of Rademacher random variables are well suited to such an analysis, random processes that involve highly biased Boolean random variables do not work very well in this framework.

To avoid this problem, we will think of performing the Spielman-Srivastava sampling in phases, such that each phase involves unbiased Boolean random variables.

To avoid having to deal with too many constants, we will think of setting , with a goal of achieving sparsification with error . We will want to think of the process of sampling an edge as a sequence of unbiased Boolean choices, so it will be convenient to round up probabilities to non-positive powers of 2. So we will set edge probabilities such that for some integer and such that

If we let , we will think of the process of sampling as proceeding in rounds. If then, in each of the last round (that is, in round through ), we choose with probability 1/2 to delete the edge and with probability 1/2 to double its weight.

(Why do we do it in the last rounds and not in the first rounds? This issue confused me a lot at some point. Hold on to this question until later.)

Let us call the graph obtained after round , so that and . We see that

Let us now understand the quadratic form in each of the above term. If we let be the weight of edge in graph , we have

Regarding the weights, if we consider an edge such that , we have that the edge is left untouched in round if , that is, if . In that case, . If , then is equally likely to be or . In other words, where is a Rademacher random variable.

Putting everything together, we have

and now we are in good shape because Rademacher sums are well suited to be analyzed as sub-Gaussian processes. In particular, we will able to prove the following lemma.

Lemma 1There are bounds such that for every outcome of the first steps that satisfies , we have

Applying the lemma inductively, we see that we have probability at least that

as desired, and it remains to observe that in every undirected graph every edge has effective resistance at least so that .

It will be convenient to do a change of variables and write

To lighten up the notation in our subsequent arguments, it will be convenient to give names to the matrices that we obtain after this change of basis, and call

In this notation, we have

and recall that we assumed

With this notation, the quantity in (9) that we want to bound becomes

where

is a centered random process.

Define the centered Gaussian process

where are independent standard normal random variables. Then we immediately see that both and are -dominated by , because

and

In order to deduce the lemma from the Talagrand comparison inequality it suffices to show that

and to have a bound on the diameter

To bound the average supremum of the Gaussian process we could use generic chaining, but fortunately there is a a matrix Chernoff bound that say that if are real-valued symmetric matrices and are independent standard normal random variables then

Applied to our setting,

where

so indeed

Now we can return to the question about the order in which we process the edges. By starting from the lowest-probability edges, we are also starting from the edges for which is smallest. When we bound

and the sum is over the edges that are processed at round , it is convenient to be able to say that has a round-dependent upper bound. Indeed, if is processed at round , then is either zero or , so that , and the terms add up to when summed over rounds. If we had proceeded in the opposite direction, we would have only been able to bound as an absolute constant times , and we would have lost a factor of the order of the number of rounds in the analysis.

The diameter is bounded by analogous considerations. The square of the diameter

and we have

and

Which gives the desired bound

Now the Talagrand comparison inequality gives us the lemma, and hence the analysis of the Spielman-Srivastava construction.

**4. A Final Comment **

There was a point that confused me for a while about this argument. Namely, we are not able to study

by showing that

is dominated by a Gaussian process, because involves biased Boolean random variables which yield poor bounds when we try to dominate them by a Gaussian distribution. Instead we write

and then we show how to dominate each term on the right-hand side by a Gaussian process. But then won’t the sum of those Gaussian processes dominate , which was not supposed to be possible?

But the point is that in the analysis of we throw away the low-probability case that, in the previous processes, we ended with . These low-probability events that we remove from consideration are enough to cut off the problematic tail of the discrete distribution that does not have a good Gaussian domination.

To get a better sense of what is happening, suppose that we are looking at the real-valued random variable

where each is a independent random variable that is equal to 1 with probability and equal to zero with probability . Suppose that is a negative power of two and that , let’s say .

We would like to say that there is a such that

Also, we have made a vow to only bound the tail of discrete random variables by showing that they are sub-Gaussian and then using the tail of the dominating Gaussian.

If we try to argue about the sub-Gaussianity of , we are in trouble, because there is probability that , so that . This is problematic because, in a Gaussian distribution, a deviation that occurs with probability can only be times the standard deviation, so the standard deviation has to be and a deviation that is achieved with probability is of the order at least which is much more than the that we were hoping for.

The problem is that a Gaussian distribution with standard deviation of the order of dominates our distribution in the regime we are interested in, but not in all regimes.

To overcome this problem, we write

where

and the random variables are constructed in the following way: , and is equally likely to be either 0 or . In this way, and .

For every choice of the , we can write

where the are independent Rademacher random variable. Over the randomness of the , the random variable has a sub-Gaussian distribution dominated by a Gaussian of variance

Now, without breaking our vow, we can inductively get a high-probability estimate that for each

while maintaining the invariant that, say, for each . When we sum up the error bounds, the whole sum is of the order of the last term, which is .

]]>
*in theory*, in which we again talk about math. I spent last Fall teaching two courses and getting settled, I mostly traveled in January and February, and I have spent the last two months on my sofa catching up on TV series. Hence I will reach back to last Spring, when I learned about Talagrand’s machinery of generic chaining and majorizing measures from Nikhil Bansal, in the context of our work with Ola Svensson on graph and hypergraph sparsification. Here I would like to record what I understood about the machinery, and in a follow-up post I plan to explain the application to hypergraph sparsification.

**1. A Concrete Setting **

Starting from a very concrete setting, suppose that we have a subset , we pick a random Gaussian vector from , and we are interested in the random variable

In theoretical computer science, for example, a random variable like (1) comes up often in the study of rounding algorithms for semidefinite programming, but this is a problem of much broader interest.

We will be interested both in bounds on the expectations of (1) and on its tail, but in this post we will mostly reason about its expectation.

A first observation is that each is Gaussian with mean zero and variance . If is finite, we can use a union bound to estimate the tail of as

and we can compute the upper bound

The above bound can be tight, but it is poor if the points of are densely clustered, and it is useless if is infinite.

It is useful to note that, if we fix, arbitrarily, an element , then we have

because . The latter expression is nicer to work with because it makes it more explicit that what we are trying to compute is invariant under shifts of , and only depends on pairwise distances of the elements of , rather than their norm.

In the cases in which (2) gives a poor bound, a natural approach is to reason about an -net of , that is, a subset such that for every there is an element such that . Then we can say that

which can be a much tighter bound. Notice that we used (2) to bound

but it might actually be better to find an -net of , and so on. In general, a tighter analysis would be to choose a sequence of nested sets , where , , and we have that is an -net of , that is, for every element of there is an element such that . Then, by generalizing the above reasoning, we get

Finally, if the cardinality of the grows sufficiently fast, namely, if we have , it is possible to refine the estimate to

where is the closest element to in . This is done by avoiding to write the expectation of the sup of a sum as a sum of expectations of sups and then using (2), but by bounding the tail of the sup of the sum directly.

At this point, we do not even need to assume that , that is finite, or that the sequence of sets is finite, and we have the following result.

Theorem 1 (Talagrand’s generic chaining inequality)Let be an arbitrary set, let be a countable sequence of finite subsets of such that and . Then

where is the element of closest to .

A short complete proof is in these notes by James Lee.

While the above Theorem has a very simple proof, the amazing thing, which is rather harder to prove, is that it is *tight*, in the sense that for every there is a sequence of sets such that the bound of the above theorem has a matching lower bound, up to an absolute constant. This is why it is called *generic chaining*. *Chaining* because the projection of on is estimated based on the “chain”

of projections of the intermediate steps of a path that goes from to passing through the . *Generic* because this upper bound technique works as well as any other possible upper bound, up to an absolute constant.

**2. An Abstract Setting **

Let now be a completely arbitrary set, and suppose that we have a distribution over functions and we want to upper bound

That is, we have a random optimization problem with a fixed feasible set , and we want to know the typical value of the optimum. For example, could be the set of cuts of a vertex set , and describe a distribution of random graphs such that is the number of edges cut in a random graph by the cut . Then the above problem is to estimate the average value of the max cut in the random graphs of the distribution. Or could be the unit sphere and describe a distribution of random Hermitian matrices such that is the quadratic form of a random matrix evaluated at . In this case, the above problem is to estimate the average value of the largest eigenvalue of such a random matrix.

We will call the collection of random variables a *random process*, where is a random variable distributed according to .

If every , and every finite linear combination , has a Gaussian distribution, then we say that is a *Gaussian process*, and if, in addition, for every then we say that it is a *centered Gaussian process*.

If , and we define for a random standard Gaussian , then is a centered Gaussian process and, in this case, upper bounding is precisely the problem we studied before.

If and for a random standard Gaussian , then, for every , we have

and, by analogy, if is a centered Gaussian process, we will define the following distance function on :

If is a centered Gaussian process then one can prove that the above distance function is a semi-metric on .

We will not need this fact, but if is a centered Gaussian process and is finite, then there is an embedding , for some , such that the process can be equivalently defined as picking and setting , so that is also an isometric embedding of the above distance function into .

The arguments of the previous section apply to centered Gaussian processes without change, and so we have.

Theorem 2Let be an arbitrary set, and be a centered Gaussian process. Let be a countable sequence of finite subsets of such that and . Then

where is the distance function and is the element of closest to according to .

**3. Sub-Gaussian Random Processes **

This theory does not seem to apply to problems such as bounding the max cut of a unweighted graph, or bounding the largest eigenvalue of a random symmetric matrix with entries, because such problems have a finite sample space and so cannot be modeled as Gaussian processes.

Fortunately, there is a notion of a *sub-Gaussian* process, which applies to such problems and which reduces their analysis to the analysis of a related Gaussian process.

First, recall that a centered real-valued random variable is *sub-Gaussian* if there is a centered Gaussian random variable whose tail dominates the tail of , that is, if we have two constants and such that, for all :

An equivalent condition is that there is a such that

In that case, we can define a norm, called the norm as

which is, roughly, the standard deviation of a centered Gaussian that dominates .

Example 1All bounded random variables are sub-Gaussian.

Example 2If

where the are independent Rademacher random variables, that is, if each is equally likely to be +1 or -1, then is sub-Gaussian with , which is within a constant factor of its actual standard deviation.

Example 3If

where the are independent and each has probability of being equal to and probability of being equal to (that is, each is a centered Bernoulli random variable), then , which is much more than the standard deviation of when is small.

Example 4If

where the are independent Rademacher random variables, and the are arbitrary real scalars, then , which is within a constant factor of the standard deviation that we would get by replacing each with a standard Gaussian.

Let now be a centered random process. We will say that a Gaussian process -dominates if, for every we have

That is, every random variable of the form is sub-Gaussian, and its tail is dominated by the tail of the Gaussian distribution

Theorem 3 (Talagrand’s comparison inequality)There is an absolute constant such that if is a centered random process that is -dominated by a centered Gaussian random process , thenFurthermore, for every ,

where is the diameter of with respect to the distance function .

The way to apply this theory is the following.

Suppose that we want estimate, on average or with high probability, the optimum of an optimization problem with feasible set over the randomness of the choice of a random instance. We model this problem like a centered random process in which is the difference between the cost of solution in a random instance minus the average cost of .

Then we think about a related random experiment, in which the random choices involved in constructing our instance are replaced by Gaussian choices (for example, instead of a random graph we may think of a complete graph with Gaussian weights on the edges chosen with expectation 1/2 and constant variance) and we let be the analogous process in this Gaussian model.

If we can argue that dominates , then it remains to estimate , which we can do either by the generic chaining theorem or by other methods.

**4. An Example **

We will now use this machinery to show that the largest eigenvalue of a random symmetric matrix with Rademacher entries is . This is certainly not the simplest way of proving such a result, but it will give a sense of how these techniques can be applied.

We let be the unit sphere.

Our Gaussian process will be to pick standard Gaussians , for each , define the matrix and let

for every .

Our “sub-Gaussian” random process is to pick Rademacher random variables , for each , define the matrix and let

for every .

We will argue that is -dominated by and that .

For the first claim, we see that for every , we can write as

So, as noted in one of our examples above, we can say that

and we see that

so that, indeed, is -dominated by .

Now we need to apply generic chaining to . It is very helpful to note that the distance function defined on by the Gaussian process is dominated by Euclidean distance between the vectors and , because

where we used the inequality

We can conclude that an -net over the unit Euclidean sphere is also a -net for the metric space . For the unit Euclidean sphere there is an -net of size at most . To apply generic chaining, let be an arbitrary subset of of cardinality if , and an -net with otherwise. Applying the generic chaining inequality,

]]>

Since the start of the lockdown 51 days ago, it was clear that the condition to “reopen” was to have in place a “test-trace-isolate” plan to find infected people as soon as possible after their contagion, trace their contacts, and isolate them. For this, one needs an infrastructure for large-scale testing with quick turnaround, a well-staffed agency to do manual tracing or an app to do it automatically, and facilities to isolate people who are infected and not in need of hospitalization.

None of this has been done. There hasn’t been a sufficient ramp-up of testing capacity; as far as I know no additional people have been hired and trained for manual contact-tracing; there is an app for digital contact-tracing, but the plan to adopt it appears to have been shelved; if people test positive and are well they are asked to stay home, potentially with their family members, who are free to leave as they please.

All our eggs are in the basket of social distancing. The humor site Kotiomkin posted “Basically, phase 2 will rely on everybody’s common sense. We are fucked”.

The problem is that social distancing requires a common-sense avoidance of close contacts with other people, and the government can give guidelines on how to achieve it, but eventually it has to rely on everybody’s sense of responsibility. Unfortunately, the national mood around regulations is to immediately look for loopholes.

For example the initial lockdown measures stipulated that one could leave home to go buy groceries. Then when the police would stop people tens of miles away from their home, people would say “I drove here to buy groceries”, “but we are thirty miles away from where you live”, “yes but here the groceries are better”. So a subsequent amendment stipulated that one could buy groceries only in the town of residence, making it hard for people living next to a town border, for whom the closest grocery store was across the town line. In fact, every time I have encountered a crazy Italian law or regulation, and asked around for the likely reason it was instituted, it was usually to close a loophole in a previous regulation, which in turn had been put in place to close a loophole in a third regulation, and basically it’s loophole-closing all the way down to Roman law.

I think that, from this point on, the story of the Italian covid-19 epidemic will not show the future of the rest of the Western world, but will evolve in its own timeline. Meanwhile, there are a couple of lessons that are still relevant, particularly in the comparison between Lombardy and NYC, which continue to track each other remarkably well.

One is that, at one point, it was decided to move older people with mild cases of covid-19 from hospital to nursing homes, to open up beds in hospitals. Since the personnel of nursing homes are not trained in the safety procedures for infectious diseases, and since nursing homes host older, frail people who are the highest-risk category for this illness, the result was a huge number of deaths in these nursing homes. Now nursing homes in New York are being asked to take covid-19 patients from hospitals.

The other is that an analysis of all-cause mortality shows a spike in March such that the difference between the typical March all-cause mortality and the March 2020 all-cause mortality is much bigger than the number of confirmed covid-19 deaths. Now the same phenomenon is being observed in New York City, with numbers similar to Lombardy’s. Notably, the baseline all-case mortality rate in Lombardy is much higher than in New York City (because population growth has stalled and the demographic skews older), so while the absolute number of additional deaths is similar, the relative increase is a much more dramatic 6x in NYC versus roughly 4x in Lombardy.

]]>The national discourse has been obsessed with “The Peak,” that is, the time when things reach their worst point, and start improving after that. For the last several days, all indicators, such as new cases, deaths, and ICU occupancy, have been improving. Apparently, then, “The Peak” is behind us. Virologists have been cautious to say that “peak” is the wrong mountain metaphor to use, and that we have rather reached a “plateau” in which things will change very slowly for a while.

Below is the number of confirmed covid-19 deaths in Italy updated with today’s data, showing that we reached the plateau a couple of weeks ago, meaning that the number of new cases started to plateau about a month ago, when the lockdown started.

The data from New York City continues to track the data from Lombardy, so NYC should be just a few days away from its own plateau, if the match continues.

Given all this, people have been wondering when and how we will get out of the lockdown, and reach what everybody has been calling the “Phase Two” of this emergency.

The lockdown is set to expire this coming Monday, and it is expected that tomorrow or Saturday the prime minister will announce new measures. (Perhaps, according to precedent, he will do so on Sunday night.) It is expected that the stay-at-home order will be extended to early May, or even mid-May, but that the definition of “essential activities” will be relaxed to allow some manufacturing to restart sooner.

Meanwhile, an infrastructure to isolate new cases and trace their contacts, which should have been frantically under construction over the last month, is still non-existent. Last week, the government nominated a committee of 70+ experts to “begin thinking about mapping out possibilities” for what such an infrastructure might be like.

To be honest, I am not too confident that the “Phase Two” will be organized with Taiwanese, or even Korean, efficiency, and my only hope is that the number of cases in Lombardy has been so under-reported that we may already be close to herd immunity.

This is probably not the case, but not by a wide margin. The Italian Institute of Statistics has released 2019 vs 2020 all-cause mortality data from a representative sample of Italian towns. Apparently, during the worst days of March, all-cause mortality roughly doubled nation-wide, while the reported deaths caused by covid-19 account for only about half of the excess deaths. This might mean that there have been 20,000 covid-19 deaths and maybe 2 million infected people out of 10 million in Lombardy. A study of the Imperial College estimates, at the high end, that 6 million Italians have been infected, and since Lombardy’s data has consistently accounted for half the national data on all measures, it would mean 3 million infected people in Lombardy, or 30%, which is within a factor of two of what might suffice for herd immunity. In any case we will not know until there is a randomized serologic study, which is something else for which experts are almost ready to begin mapping out ways of thinking about how to explore plans for …

What will life be like in “Phase Two”? If the epidemic continues at a slow burn, will we have to continue to keep a one-meter distance from strangers? Will trains and planes run with only every third seat occupied? Will tickets cost three times as much? Will beaches be open during the summer? Will there be riots if Italians are not allowed to go to the beach in August? Apart from the last question, whose answer is obviously yes, everything is up in the air.

What about me, after 32 days of lockdown? I was already in need of a haircut at the end of February, and lately the hair situation had become untenable, so I used my beard trimmer to blindly cut my hair. Mistakes were made, but I would not even rate it among my top ten worst haircuts ever.

]]>

**Will lockdowns work in countries that are not big on rules-following?**

Apparently, yes. The number of confirmed cases is not a very reliable signal of what is happening, because it is tied as much to the testing capacity as to the actual diffusion of the infection, while the number of deaths is a more informative one (although read below about some issues with how this number is reported). Schools were closed 34 days ago in Lombardy, which went on lockdown 21 days ago, and all of Italy went on lockdown 19 days ago. For the last several days there has been a slowing down in the number of reported deaths, and daily numbers have been decreasing for the last two days. This is a graph updated today of the cumulative number of deaths:

(Data from Protezione Civile, chart by me)

**How does this compare to other places?**

This graph shows cumulative number of deaths in NYC and in Lombardy, which have a similar population, shifting the NYC data by 17 days:

(Data from Protezione Civile and NYT, chart by me)

In Lombardy, schools were closed on February 24 (along with universities, museums, cinemas and several other places) and the lockdown started on March 8; in the shifted timeline, the New York lockdown started on March 3, so one should hope for an earlier slowdown and that NYC will do a lot better than Lombardy.

**But how bad would things be without restrictive measures?**

Some smaller towns in Lombardy, where the virus might have circulated for several weeks before the lockdown, have had substantial increases in all-cause deaths. These go from 4x the usual number over the last three months to 7x the usual number during the first three weeks of March. These excess deaths are more than the number of reported covid19 deaths from such towns, and this should be taken into account when looking at reported mortality in Lombardy. Already, the excess deaths in these small towns account for about 0.7% of the population, suggesting that worst-case scenarios of 2 million deaths in the US and of half a million in the UK without mitigation/containment measures made sense.

**What’s it like after three weeks of lockdown?**

The mood in the country seems to be shifting: after all the singing from the balconies, and the baking of cakes, and the practicing of yoga and the posting on Instagram of all of the above, the mood is souring a bit. Unions and employers are at odds on when to reopen factories and many families are living on their savings and wondering how long they can manage to do so. People whose work relates to tourism and hospitality (which accounts for a very large fraction of the country’s GDP) are wondering not just when they will be able to reopen their business or return to their job, but if they will be able to do so.

Because of the way the Euro works, Italy cannot just decide to embark on a massive stimulus program. This is because each state’s budget deficit “creates” Euros, and so there is a common policy in the “Eurozone” that limits each state deficit-spending. While this is being negotiated, the Italian government has committed to 25 billions of extra spending, and there is already a big scramble to lobby for who and how should get various parts of this stimulus program.

**What will it be like after the lockdown is lifted?**

I am worried that there is no official plan for that.

I hope that there are lots of competent experts working in secret on a plan that takes the best of the contact-tracing and isolation strategies of Korea, Singapore and Taiwan, injects in it the strong tradition of privacy laws that (little known fact) Italy pioneered in Europe even before the GDPR, and creates a wonderfully functioning system.

Hey, once you are hoping, you may as well hope big!

**What are good sources for predictions of future scenarios?**

Memes! It has been mind-bending how yesterday’s satire becomes tomorrow’s news.

For example, this meme started circulating on March 14, in which Johnson says “because of the coronavirus, we should prepare to lose some of our loved ones,” to which the queen replies “I am sorry for you and your family.” Johnson tested positive two weeks later.

On March 15, the Pope visited a church in (a deserted) central Rome, where there is a crucifix that, according to tradition, survived unscathed a fire in 1519 and, in 1522, was taken around the city and stopped an epidemic of bubonic plague. The pope prayed for the end of the covid19 epidemic.

On March 24, Lercio, the Italian equivalent of The Onion, published an article titled “‘Maybe He didn’t hear me the first time’: Pope Francis asks God again to stop the epidemic”.

Sure enough, on March 27 the Pope prayed again for the end of the epidemic, alone in a deserted Saint Peter square.

(Photo credit: Yara Nardi and Vatican Press Office)

The images of the pope walking alone at dusk toward the stage on which he prayed were something no disaster movie had prepared us for. Of course people were able to see the humor in that as well.

**Edited to add:** On March 13, The Onion published an article titled Health Experts Worry Coronavirus Will Overwhelm America’s GoFundMe System. On March 26, The New York Times published an article titled GoFundMe Confronts Coronavirus Demand.

The graph below, which is courtesy of Carlo Lucibello, shows the number of deaths in Italy on a logarithmic scale, compared with data from China from 36 days before.

(Image credit: Carlo Lucibello)

At the start, Italian deaths rose like in China, at the same exponential rate. About twenty days after the lockdown of Wuhan, the Chinese data started deviating from the exponential rate and leveled off. In Italy, about ten days ago, there was a slowdown, which followed the institution of the “yellow zone” by about 15 days. The “yellow zone” measures closed schools, universities, museums, cinemas, and clubs, and restricted hours of bars and coffee shops, in Lombardy. Apparently, although these measures made a difference, they still allowed the spread of the virus to continue at an exponential rate.

On March 8, Lombardy was put on a stricter lockdown, with travel restrictions, and on March 10 the lockdown was extended to the rest of the country. So we may hope to see a stronger slowdown and maybe a leveling-off two or three weeks after these measures, that is, any day now. It may seem premature to ask this question, but what happens next?

Today the Italian government announced additional measures to facilitate “social distancing,” halting all “non-essential” manufacturing and other work activities, forbidding people from leaving the house to walk or jog (even alone), and further restricting the cases in which it is allowed to travel between different cities.

These measures, which apply nationwide, are meant to be in place for two weeks. They will be economically devastating (even more so than the already devastating nationwide lockdown of March 10), and they will be difficult to keep in place for longer than the expected two weeks.

When a nationwide “lockdown” was first instituted, the prime minister announced it by saying “let’s be distant today in order to more warmly hug each other tomorrow”. In general, the spirit of these measures has been to suffer for a short time and then return to normal.

This feels like the national mood in general, and the government took today’s further restrictive measures somewhat reluctantly, because there was strong popular support for them.

Here I am worried that we are approaching this crisis the way many people attempt to lose weight: by going on a starvation diet, then losing some weight, then celebrating and finally gaining back more weight than they lost.

The point being that I worry about what will happen once the worst is over and these restrictive measures will be lifted. Until there is a vaccine or a cure, we will not be able to really go back to normal, and we will have to make some sustainable “lifestyle changes” to “maintain” what we got, just like people who maintain weight loss for a long time do so by making sustainable changes for the long term.

Concretely, we will need a very efficient system to monitor new cases and trace contacts, perhaps similar to Taiwan’s, and to follow the kind of stricter hygiene precautions in public places that have been common in East Asia since SARS. Let’s hope that we will have to worry about such problems soon.

]]>How is social distancing working out for me? I thought that I was well prepared for it, but it is still not easy. I have started to talk to the furniture, and apparently this is perfectly normal, at least as long as the furniture does not talk back.

As I have been telling my dining table, it has been very dismaying to read news from the US, where there seemed to be a very dangerous complacency. I am relieved to see that this is changing, especially at the state level, which makes me much more hopeful.

I have also found media coverage to be disappointing. Apparently, many highly educated people, including people whose job involves understanding policy issues, have no idea how numbers work (source). This is a problem because a lot of issues concerning this epidemic have to do with numbers, which can be misleading if they are not reported in context.

For example, before the time when Trump decided that he had retroactively been concerned about a pandemic since January, conservative media emphasized the estimate of a 2% mortality rate, in a way that made it sound, well, 98% of people survive, and 98% is approximately 100%, so what is the big deal. For context, the Space Shuttle only exploded 1.5% of the times, and this was deemed *too dangerous for astronauts*. This is the kind of intuitive reference that I would like to see more of.

Even now, there is a valid debate on whether measures that will cost the economy trillions of dollars are justified. After all, it would be absurd to spend trillions of dollars to save, say, 10,000 lives, it would be questionable to do so to save 100,000 lives, and it would be undoubtedly right to do so to save millions of lives and a collapse of the health care system (especially considering that a collapse of the health care system might create its own financial panic that would also cost trillions of dollars).

So which one is it? Would doing nothing cost 10,000 American lives? A million? How long will people have to “shelter at home”? And what is next? I can recommend two well-researched articles: this on plausible scenarios and this on what’s next.

Kristof’s article cites an essay by Stanford professor John Ioannidis who notes that it is within the realm of possibilities, given the available data, that the true mortality rate could be as low as 0.05%, that is, wait for it, lower than the mortality rate of the flu. Accordingly, in a plausible scenario, “If we had not known about a new virus out there, and had not checked individuals with PCR tests, the number of total deaths due to “influenza-like illness” would not seem unusual this year.”

Ioannidis’ essay was written without reference to data from Italy, which was probably not available in peer-reviewed form at the time of writing.

I would not want professor Ioannidis to tell me how to design graph algorithms, and I don’t mean to argue for the plausibility of the above scenario, but let me complement it with some data from Italy.

Lombardy is Italy’s richest and most developed region, and the second richest (in absolute and PPP GDP) administrative region in Europe after the Ile de France (source). It has a rather good health care system. In 2018, on average, 273 people died per day in Lombardy of all causes (source). Yesterday, 381 people died in Lombardy with coronavirus (source). This is spread out over a region with more than 10 million residents.

Some areas are harder-hit hotspots. Three days ago, a Bergamo newspaper reported that 330 people had died in the previous week of all causes in the city. In the same week of March in 2019, 23 people had died. That’s a 14x increase of mortality of all causes. **Edited to add (3/22/2020):** *the mayor of Bergamo told Reuters that 164 people died in Bergamo of all causes in the first two weeks of March 2020, versus 56 in the first two weeks of March 2019, a 3x increase instead of the 14x increase reported by Bergamo News.*

Bergamo’s hospital had 16 beds in its intensive care unit, in line with international standards (it is typical to have of the order of an ICU bed per 5000-10,000 people, and Bergamo has a population of 120,000). Right now there are 80 people in intensive care in Bergamo, a 5x increase in capacity that was possible by bringing in a lot of ventilators and moving other sick people to other hospitals. Nonetheless, there have been reports of shortages of ICU beds, and of people needing to intubated that could not be. There are also reports of people dying of pneumonia at home, without being tested.

Because of this surge in deaths, Bergamo’s funeral homes have not been able to keep up. It’s not that they have not been able to keep up with arranging funerals, because funerals are banned. They just do not have the capacity to perform the burials.

So coffins have been accumulating. A couple of days ago, a motorcade of army vehicles came to Bergamo to pick up 70 coffins and take them to other cities.

It should be noted that this is happening after 20 days of “social distancing” measures and after 13 days of “sheltering at home” in Lombardy.

My point being, if we had not known that a news virus was going around, the number of excess deaths in Bergamo would have not been hidden by the random noise in the number of deaths due to influenza-like illness.

]]>