# Hasselmann, Manabe and Parisi win 2021 Physics Nobel Prize

Today the Italian academic community, along with lots of other people, was delighted to hear that Giorgio Parisi is one of the three recipients of the 2021 Nobel Prize for Physics.

Parisi has been a giant in the area of understanding “complex” and “disordered” systems. Perhaps, his most influential contribution has been his “replica method” for the analysis of the Sherrington-Kirkpatrick model. His ideas have led to several breakthroughs in statistical physics by Parisi and his collaborators, and they have also found applications in computer science: to tight analyses on a number of questions about combinatorial optimization on random graphs, to results on random constraint satisfaction problems (including the famous connection with random k-SAT analyzed by Mezard, Parisi and Zecchina) and random error correcting codes, and to understanding the solution landscape in optimization problems arising from machine learning. Furthermore these ideas have also led to the development and analysis of algorithms.

The news was particularly well received at Bocconi, where most of the faculty of the future CS department has done work that involved the replica method. (Not to be left out, even I have recently used replica methods.)

Mezard and Montanari have written a book-length treatment on the interplay between ideas from statistical physics, algorithms, optimization, information theory and coding theory that arise from this tradition. Readers of in theory looking for a shorter exposition aimed at theoretical computer scientists will enjoy these notes posted by Boaz Barak, or this even shorter post by Boaz.

In this post, I will try to give a sense to the reader of what the replica method for the Sherrington-Kirkpatrick model looks like when applied to the average-case analysis of optimization problems, stripped of all the physics. Of course, without the physics, nothing makes any sense, and the interested reader should look at Boaz’s posts (and to references that he provides) for an introduction to the context. I did not have time to check too carefully what I wrote, so be aware that several details could be wrong.

What is the typical value of the max cut in a ${G_{n,\frac 12}}$ random graph with ${n}$ vertices?

Working out an upper bound using union bounds and Chernoff bound, and a lower bound by thinking about a greedy algorithm, we can quickly convince ourselves that the answer is ${\frac {n^2}{4} + \Theta(n^{1.5})}$. Great, but what is the constant in front of the ${n^{1.5}}$? This question is answered by the Parisi formula, though this fact was not rigorously established by Parisi. (Guerra proved that the formula gives an upper bound, Talagrand proved that it gives a tight bound.)

Some manipulations can reduce the question above to the following question: suppose that I pick a random ${n\times n}$ symmetric matrix ${M}$, say with zero diagonal, and such that (up to the symmetry requirement) the entries are mutually independent and each entry is equally likely to be ${+1}$ or ${-1}$, or perhaps each entry is distributed according to a standard normal distribution (the two versions can be proved to be equivalent), what is the typical value of

$\displaystyle \max _{x \in \{+1,1\}^n } \ \ \frac 1{n^{1.5}} x^T M x$

up to ${o_n(1)}$ additive terms,?

As a first step, we could replace the maximum with a “soft-max,” and note that, for every choice of ${\beta>0}$, we have

$\displaystyle \max _{x \in \{+1,1\}^n } \ \ x^T M x \leq \frac 1 \beta \log \sum_{x \in \{+1,1\}^n } e^{\beta x^T Mx}$

The above upper bound gets tighter and tighter for larger ${\beta}$, so if we were able to estimate

$\displaystyle \mathop{\mathbb E} \log \sum_{x \in \{+1,1\}^n } e^{\beta x^T Mx}$

for every ${\beta}$ (where the expectation is over the randomness of ${M}$) then we would be in good shape.

We could definitely use convexity and write

$\displaystyle \mathop{\mathbb E} \max _{x \in \{+1,1\}^n } \ \ x^T M x \leq \frac 1 \beta \mathop{\mathbb E} \log \sum_{x \in \{+1,1\}^n } e^{\beta x^T Mx} \leq \frac 1 \beta \log \mathop{\mathbb E} \sum_{x \in \{+1,1\}^n } e^{\beta x^T Mx}$

and then use linearity of expectation and independence of the entries of ${M}$ to get to

$\displaystyle \leq \frac 1 \beta \log \sum_{x \in \{+1,1\}^n } \prod_{1\leq i < j\leq n} \mathop{\mathbb E} e^{2\beta M_{i,j} x_i x_j }$

Now things simplify quite a bit, because for all ${i the expression ${M_{i,j} x_i x_j}$, in the Rademacher setting, is equally likely to be ${+1}$ or ${-1}$, so that, for ${\beta = o(1)}$, we have

$\displaystyle \mathop{\mathbb E} e^{2\beta M_{i,j} x_i x_j } = cosh (2\beta) \leq 1 + O(\beta^2) \leq e^{O(\beta^2)}$

and

$\displaystyle \sum_{x \in \{+1,1\}^n } \prod_{1\leq i < j\leq n} \mathop{\mathbb E} e^{2\beta M_{i,j} x_i x_j } \leq 2^n \cdot e^{O(\beta^2 n^2)}$

so that

$\displaystyle \frac 1 \beta \log \sum_{x \in \{+1,1\}^n } \prod_{1\leq i < j\leq n} \mathop{\mathbb E} e^{2\beta M_{i,j} x_i x_j } \leq \frac {O(n)}{\beta} + O(\beta n^2)$

which, choosing ${\beta = 1/\sqrt n}$, gives an ${O(n^{1.5})}$ upper bound which is in the right ballpark. Note that this is exactly the same calculations coming out of a Chernoff bound and union bound. If we optimize the choice of ${\beta}$ we unfortunately do not get the right constant in front of ${n^{1.5}}$.

So, if we call

$\displaystyle F := \sum_{x \in \{+1,1\}^n } e^{\beta x^T Mx}$

we see that we lose too much if we do the step

$\displaystyle \mathop{\mathbb E} \log F \leq \log \mathop{\mathbb E} F$

But what else can we do to get rid of the logarithm and to reduce to an expression in which we take expectations of products of independent quantities (if we are not able to exploit the assumption that ${M}$ has mutually independent entries, we will not be able to make progress)?

The idea is that if ${k>0}$ is a small enough quantity (something much smaller than ${1/\log F}$), then ${F^k}$ is close to 1 and we have the approximation

$\displaystyle \log F^k \approx F^k-1$

and we obviously have

$\displaystyle \log F^k = k \log F$

so we can use the approximation

$\displaystyle \log F \approx \frac 1k (F^k - 1)$

and

$\displaystyle \mathop{\mathbb E} \log F \approx \frac 1k (\mathop{\mathbb E} F^k - 1)$

Let’s forget for a moment that we want ${k}$ to be a very small parameter. If ${k}$ was an integer, we would have

$\displaystyle \mathop{\mathbb E} F^k = \mathop{\mathbb E} \left( \sum_{x \in \{+1,1\}^n } e^{\beta x^T Mx} \right)^k = \sum_{x^{(1)},\ldots x^{(k)} \in \{+1,-1\}^n} \mathop{\mathbb E} e^{\beta \cdot ( x^{(1) T} M x^{(1)} + \cdots + x^{(k)T} M x^{(k)}) }$

$\displaystyle = \sum_{x^{(1)},\ldots x^{(k)} \in \{+1,-1\}^n} \ \ \prod_{i< j} \ \mathop{\mathbb E} e^{2\beta M_{i,j} \cdot ( x^{(1)}_i x^{(1)}_j + \cdots + x^{(k)}_i x^{(k)}_j )}$

Note that the above expression involves choices of ${k}$-tuples of feasible solutions of our maximization problem. These are the “replicas” in “replica method.”

The above expression does not look too bad, and note how we were fully able to use the independence assumption and “simplify” the expression. Unfortunately, it is actually still very bad. In this case it is preferable to assume the ${M_{i,j}}$ to be Gaussian, write the expectation as an integral, do a change of variable and some tricks so that we reduce to computing the maximum of a certain function, let’s call it ${G(z)}$, where the input ${z}$ is a ${k \times k}$ matrix, and then we have to guess what is an input of maximum value for this function. If we are lucky, the maximum is equivalent by a ${z}$ in which all entries are identical, the replica symmetric solution. In the Sherrington-Kirkpatrick model we don’t have such luck, and the next guess is that the optimal ${z}$ is a block-diagonal matrix, or a replica symmetry-breaking solution. For large ${k}$, and large number of blocks, we can approximate the choice of such matrices by writing down a system of differential equations, the Parisi equations, and we are going to assume that such equations do indeed describe an optimal ${z}$ and so a solution to the integral, and so they give as a computation of ${(\mathop{\mathbb E} F^k - 1)/k}$.

After all this, we get an expression for ${(\mathop{\mathbb E} F^k - 1)/k}$ for every sufficiently large integer ${k}$, as a function of ${k}$ up to lower order errors. What next? Remember how we wanted ${k}$ to be a tiny real number and not a sufficiently large integer? Well, we take the expression, we forget about the error terms, and we set ${k=0\ldots}$

# ما همه ایرانی

Many people are angry and heartbroken at the consequences (on themselves, their loved ones, their friends and coworkers) of the executive order that has banned refugees, as well as legal immigrants and green card holders, from certain countries from entering the US for the next few months. A common question is, what can we do? A few possibilities:

• Donate to the ACLU. They have a long history of fighting for civil rights and, on Saturday, they immediately sprung into action and where able to get a stay on the ban, whose implications are still not clear.
As you are contemplating donations, consider also supporting Planned Parenthood and the Southern Poverty Law Center. This is unrelated to the current refugees/immigrant crisis, but it is relevant to what will probably be future crises instigated by the current administration. (The SPLC does a great work in tracking and documenting hate groups.)
• Sign this petition, which has received significant media coverage
• Do what you can to make sure our professional societies produce a response. Both the outgoing and the incoming presidents of the AMS have signed the above petition, and I understand that the appropriate committee of the AMS  is considering making a statement. I don’t know if the ACM is planning a similar action, and if you have access to the ACM leadership, please lean on them to do so.
(Edited to add: ACM put out a statement Monday morning, and so this the AMS.)
• Call your representatives in congress, especially if you live in a state with Republican senators or in a district with a Republican representative. Don’t email: call and ask to speak with the staffer who is responsible for immigration matters.
• Reach out to colleagues, students and staff who are affected by the executive order. If you have channels to do so, pressure campus leadership to cover their legal expenses, which could be substantial.

# The vocabulary is political

Remember when the (GW) Bush administration decided that torture should be called “enhanced interrogation techniques,” and then the New York Times, like all other American news sources, started to call torture “enhanced interrogation techniques,” and people wrote angry letters to the public editor, and the public editor eventually wrote that if the White House decides that torture should not be called torture then calling torture torture is a political issues on which reporters should not take sides?

And Paul Krugman wrote a column in which he said that if the Bush administration declared that the Earth is flat, then the New York Times would have an article headlined “Roundness of Earth disputed,” in which it would make sure to have quotes both from round-Earther and flat-Earther?

Part of the reason for all this, as explained in this very good article in The Atlantic, is that political reporters (as opposed to political opinion writers) need access, that is, they need White House officials to talk to them, whether on the record or off the record. The ability to withhold access gives administration officials great leverage.

And so Myron Ebell, who works in a think-tank financed by the coal industry, and according to whom climate change is a vast conspiracy perpetrated by left-wing scientists, some of which scientists he has personally attacked, and who is being considered to lead the EPA, is now a climate contrarian, according to the New York Times.

Maybe soon enough, outside the opinion page, the New York Times will start calling racism “race realism.”

# Ancient wisdom

[I sneeze several times and then the following conversation happens]

J.Z.: In China, we say that if you sneeze once, it means that someone is thinking of you. If you sneeze twice, it means someone is cursing you.

Me: and what does it mean when I sneeze three times or more?

J.Z.: it means you have a cold.

# Old and new allegiances

The Italian prime minister is in the United States for the UN general assembly meeting, and he was in the San Francisco bay area on Sunday and Monday. (No, this is not the one who paid money to an underage prostitute and had she released from custody by falsely claiming she was the niece of Mubarak, it’s the new one.)

Those two days were busy, and he met Italian start-up founders in the area, went to a dinner at Stanford hosted by John Hennessy, presided the inauguration of a new Italian-language school, went to Twitter, Google, Facebook and Yahoo, and he met the local Italian research community.

For the last event, a few colleagues and I were asked to give a short presentation. Being not sure what to say to a prime minister, I asked a colleague who is the department chair at an Italian computer science department for some data on state funding of university research in computer science, and if there was a way to turn this data into a recommendation, and his response could be summarized as “we cannot be saved, there is no hope.” This might have made a good theme for a presentation, but instead I talked about the importance of fundamental research, and of working on ideas for which the technology is not ready yet, so that when the technology is ready the ideas are mature. Politicians are good at feigning attention when their mind is elsewhere, and he feigned it well.

Yesterday I was “interviewed” as part of the process to naturalize as an American citizen. Part of the interview is a test of the knowledge that an American is supposed to have. I liked to think that the officer would bring up a map of the world and ask me to point to France, then I would point to Brazil, and he would say, “great, now you are ready to be an American!” (Instead he asked how many US senators there are, when was the constitution written, and things like that.) The vibe was very different from any other interaction I have had with the American immigration system before; now it’s not any more “who are you, why are you stealing our jobs, and how do we know you are not a terrorist,” but it’s all “yay, you are going to be one us.”

# The Facebook study, sampling, and the principle of delayed decision

A few weeks ago, the Proceedings of the National Academy of Science published an article on a study conducted by a group of Cornell researchers at Facebook. They picked about 600,000 users and then, for a week, a subset of them saw fewer “negative” posts (up to 90% were filtered) than they would otherwise see, a subset saw fewer “positive” posts (same), and a control group got a random subset.

After the week, the users in the “negative” group posted fewer, and more negative, posts, and those in the “positive” group posted more, and more positive, posts.

Posts were classified according to an algorithm called LIWC2007.

The study run contrary to a conventional wisdom that people find it depressing to see on Facebook good things happening to their friends.

The paper has caused considerable controversy for being a study with human subjects conducted without explicit consent. Every university, including of course Cornell, requires experiments involving people to be approved by a special committee, and participants must sign informed consent forms. Facebook maintains that the study is consistent with its terms of service. The highly respected privacy organization EPIC has filed a complaint with the FTC. (And they have been concerned with Facebook’s term of service for a long time.)

Here I would like to explore a different angle: almost everybody thinks that observational studies about human behavior can be done without informed consent. This means that if the Cornell scientists had run an analysis on old Facebook data, with no manipulation of the feed generation algorithm, there would not have been such a concern.

At the same time, the number of posts that are fit for the feed of a typical user vastly exceed what can fit in one screen, and so there are algorithms that pick a rather small subset of posts that are evaluated to be of higher relevance, according to some scoring function. Now suppose that, if N posts fit on the screen, the algorithm picks the 2N highest scoring posts, and then randomly picks half of them. This seems rather reasonable because the scoring function is going to be an approximation of relevance anyway.

The United States has roughly 130 million Facebook subscriber. Suppose that the typical user looks, in a week, at 200 posts, which seems reasonable (in our case, those would be a random subset of roughly 400 posts). According to the PNAS study, roughly 50% of the posts are positive and 25% are negative, so of the initial 400, roughly 200 are positive and 100 are negative. Let’s look at the 100,000 users for which the random sampling picked the fewest positive posts: we would be expecting roughly 3 standard deviations below the mean, so about 80 positive posts instead of the expected 100; the 100,000 users with the fewest negative posts would get about 35 instead of the expected 50.

This is much less variance than in the PNAS study, where they would have got, respectively, only 10 positive and only 5 negative, but it may have been enough to pick up a signal.

Apart from the calculations, which I probably got wrong anyway, what we have is that in the PNAS study they picked a subset of people and then they varied the distribution of posts, while in the second case you pick random posts for everybody and then you select the users with the most variance.

If you could arrange distributions so that the distributions of posts seen by each users are the same, would it really be correct to view one study as experimental and one as observational? If the PNAS study had filtered 20% instead of 90% of the positive/negative posts, would it have been ethical? Does it matter what is the intention when designing the randomized algorithm that selects posts? If Facebook were to introduce randomness in the scoring algorithm with the goal of later running observational studies would it be ethical? Would they need to let people opt out? I genuinely don’t know the answer to these questions, but I haven’t seen them discussed elsewhere.

# On Berlusconi’s legacy

Here is the call for applications, from the official Italian web site of the Ministry for Education and Research, for a postdoctoral fellowship on a project titled “‘Dalla pecora al pecorino’ tracciabilità e rintracciabilità di filiera nel settore lattiero caseario toscano”, which roughly translates to “From sheep to pecorino, traceability in the Tuscan dairy industry.”

The announcement has an English translation, and something got lost in translation, having to do to the fact that in Italy we say “sheep style” instead of “doggy style” (don’t ask).

Update 2/16/2012: the page has been updated, below is a screenshot before the update (click to expand)

# Is the British government hiring Italian political consultant?

David Willetts, the British minister for higher education, has recently announced that the government was “inviting proposals for a new type of university with a focus on science and technology and on postgraduates.” The NYT article on the matter may be biased, but it looks like this announcement could have come from the Italian ministry of university and research, and I mean it as an offense.

So, how much will the government invest in this new university? “There will be no additional government funding,” Mr. Willetss says, and all the funding will have to come from the private sector. And what is the government’s vision and plan for this new university? Mr. Willetts says that “We are not intending to issue any guidelines. We want people to come to us with ideas.”

So the idea of the minister for higher education is that the private sector comes up with all the funding and all the planning for a new university. (Imagine the home secretary stating the goal of increasing the police force, but all the new police force would be paid for by the private sector, which, after all, has an interest in reducing street crime, and that it is not the intention of the home office to dictate how this private police force should operate.) This is exactly what an Italian minister would talk about at a press conference, only to be forgotten the following week.

The reaction from the academia, however, is different. In Italy, you would see people throw their hands in the air and say “madonna mia, in mano a chi siamo,” while the British are masters of understatement.

“We at Oxford feel that keeping the U.K. a world leader in science and research is a very important objective,” Ian Walmsley, Oxford’s chief research officer, “and we’re pleased that the government agrees with that.”

Stephen Caddick, for the University College at London says the proposal is “not uninteresting”.

# Lies, Damn Lies, and Spaceflight Safety

From an interview with Ed Mango, head of NASA’s commercial crew program, in which he discusses safety requirements for commercial entities who want to subcontract flights to the ISS from NASA.

Chaikin: And the probability of “loss of crew” has to be better than 1 in 1000?

Mango: Yes and no. What we’ve done is we’ve separated those into what you need for ascent and what you need for entry. For ascent it’s 1 in 500, and independently for entry it’s 1 in 500. We don’t want industry … to [interpret the 1-in-1,000 requirement] to say, “We’ve got a great ascent; we don’t need as much descent protection.” In reality we’ve got to protect the life of the crew all the time.

Now [the probability for] the mission itself is 1 in 270. That is an overall number. That’s loss of crew for the entire mission profile, including ascent, on-orbit, and entry. The thing that drives the 1 in 270 is really micrometeorites and orbital debris … whatever things that are in space that you can collide with. So that’s what drops that number down, because you’ve got to look at the 210 days, the fact that your heat shield or something might be exposed to whatever that debris is for that period of time. NASA looks at Loss of Vehicle the same as Loss of Crew. If the vehicle is damaged and it may not be detected prior to de-orbit, then you have loss of crew.

What does “yes” mean in the “yes and no” answer? Also, with a 1/500 probability of accident at takeoff and an independent 1/500 probability of accident at landing, we are already at a 1/250.2 probability of accident, so how do we get to 1/270 after adding accidents in mid-flight?

In not entirely unrelated news, a member of the board of Florida’s 3rd district took, and failed, the Florida Comprehensive Assessment Test (FCAT), a standardized test, as documented in two posts on the Washington Post blog.

Relevant quote:

“I won’t beat around the bush. The math section had 60 questions. I knew the answers to none of them, but managed to guess ten out of the 60 correctly. On the reading test, I got 62% . In our system, that’s a ‘D,’ and would get me a mandatory assignment to a double block of reading instruction.

“It seems to me something is seriously wrong. I have a bachelor of science degree, two masters degrees, and 15 credit hours toward a doctorate. I help oversee an organization with 22,000 employees and a $3 billion operations and capital budget, and am able to make sense of complex data related to those responsibilities…. Here is a sample of the math portion of the 10th grade FCAT, the most advanced one. Sample question: “An electrician charges a$45 fee to make a house call plus an hourly rate for labor. If the electrician works at one house for 3 hours and charges \$145.50 for the job, what is the electrician’s hourly rate?” You can use a calculator.

# Sidney Coleman on Teaching and Princeton

From a 1977 interview with the late theoretical physicist Sidney Coleman.

Coleman: Teaching is unpleasant work. No question about it. It has its rewards. One feels happy about having a job well done. Washing the dishes, waxing the floors (things I also do on a regular basis since I’m a bachelor) have their rewards. I am pleased when I have done a good job waxing the floor and I’ve taken an enormous pile of dirty dishes and reduced them to sparkling clean ones. On the other hand, if I didn’t have to, I would never engage in waxing the floors, although I’m good at it. I’m also good at teaching; I’m considered very good at teaching, both by myself and others. And I’m also terrifically good at washing dishes, in fact. On the other hand, I certainly would never make a bunch of dirty dishes just for the joy of washing them and I would not teach a course just for the joy of teaching a course. (…) if someone were to suddenly say to me, look you can sit in this office and talk and do physics with the same people, everything would be the same except you would never have to teach a course and never have to see a graduate student, and we’ll halve your salary — I’d leap at the offer.

Sopka: So I guess really you would be happier with the format of an institute of theoretical physics? Rather than a teaching institution like a university?

Coleman: Well no. That makes it too abstract. Because that means, would you like to have a position at, say, the Institute for Advanced Studies? And then all sorts of other things would enter the picture. Like you’d have to live in Princeton which is truly an awful experience. (…) It’s a terrible place. Dullest place in the world. No I wouldn’t say that, but certainly the dullest place at which decent science or decent scholarship is done in the world today. The only advantage to Princeton is that it’s close to Princeton Junction.

The whole interview is worth reading. (Via not even wrong)