Bocconi Hired Poorly Qualified Computer Scientist

Today I received an interesting email from our compliance office that is working on the accreditation of our PhD program in Statistics and Computer Science.

One of the requisites for accreditation is to have a certain number of affiliated faculty. To count as an affiliated faculty, however, one must pass certain minimal thresholds of research productivity, the same that are necessary to be promoted to Associate Professor, as quantified according to Italy’s well intentioned but questionably run initiative to conduct research evaluations using quantifiable parameters.

(For context, every Italian professor maintains a list of publications in a site run by the ministry. Although the site is linked to various bibliographic databases, one has to input each publication manually into a local site at one’s own university, then the ministry site fetches the data from the local site. The data in the ministry site is used for these research evaluations. At one point, a secretary and I spent long hours entering my publications from the past ten years, to apply for an Italian grant.)

Be that as it may, the compliance office noted that I did not qualify to be an affiliated faculty (or, for that matter, an Associate Professor) based on my 2016-2020 publication record. That would be seven papers in SoDA and two in FOCS: surely Italian Associate Professors are held to high standards! It turns out, however, that one of the criteria counts only journal publications.

Well, how about the paper in J. ACM and the two papers in SIAM J. on Computing published between 2016 and 2020? That would (barely) be enough, but one SICOMP paper has the same title of a SoDA paper (being, in fact, the same paper) and so the ministry site had rejected it. Luckily, the Bocconi administration was able to remove the SoDA paper from the ministry site, I added again the SICOMP version, and now I finally, if barely, qualify to be an Associate Professor and a PhD program affiliated faculty.

This sounds like the beginning of a long and unproductive relationship between me and the Italian system of research evaluation.

P.S. some colleagues at other Italian universities to whom I told this story argued that the Bocconi administration did not correctly apply the government rules, and that one should count conference proceedings indexed by Scopus; other colleagues said that indeed the government decree n. 589 of August 8, 2018, in article 2, comma 1, part a, only refers to journals. This of course only reinforces my impression that the whole set of evaluation criteria is a dumpster fire that is way too far gone.

What is next?

Greetings from the future! The progression of covid-19 in Italy is running about eight days ahead of France and Spain and about 16 days ahead of the United States. Here in Lombardy, which is about ten days ahead of NYC, we have been “sheltering at home” for 13 days already.

How is social distancing working out for me? I thought that I was well prepared for it, but it is still not easy. I have started to talk to the furniture, and apparently this is perfectly normal, at least as long as the furniture does not talk back.

As I have been telling my dining table, it has been very dismaying to read news from the US, where there seemed to be a very dangerous complacency. I am relieved to see that this is changing, especially at the state level, which makes me much more hopeful.

I have also found media coverage to be disappointing. Apparently, many highly educated people, including people whose job involves understanding policy issues, have no idea how numbers work (source). This is a problem because a lot of issues concerning this epidemic have to do with numbers, which can be misleading if they are not reported in context.

For example, before the time when Trump decided that he had retroactively been concerned about a pandemic since January, conservative media emphasized the estimate of a 2% mortality rate, in a way that made it sound, well, 98% of people survive, and 98% is approximately 100%, so what is the big deal. For context, the Space Shuttle only exploded 1.5% of the times, and this was deemed too dangerous for astronauts. This is the kind of intuitive reference that I would like to see more of.

Even now, there is a valid debate on whether measures that will cost the economy trillions of dollars are justified. After all, it would be absurd to spend trillions of dollars to save, say, 10,000 lives, it would be questionable to do so to save 100,000 lives, and it would be undoubtedly right to do so to save millions of lives and a collapse of the health care system (especially considering that a collapse of the health care system might create its own financial panic that would also cost trillions of dollars).

So which one is it? Would doing nothing cost 10,000 American lives? A million? How long will people have to “shelter at home”? And what is next? I can recommend two well-researched articles: this on plausible scenarios and this on what’s next.

Kristof’s article cites an essay by Stanford professor John Ioannidis who notes that it is within the realm of possibilities, given the available data, that the true mortality rate could be as low as 0.05%, that is, wait for it, lower than the mortality rate of the flu. Accordingly, in a plausible scenario, “If we had not known about a new virus out there, and had not checked individuals with PCR tests, the number of total deaths due to “influenza-like illness” would not seem unusual this year.”

Ioannidis’ essay was written without reference to data from Italy, which was probably not available in peer-reviewed form at the time of writing.

I would not want professor Ioannidis to tell me how to design graph algorithms, and I don’t mean to argue for the plausibility of the above scenario, but let me complement it with some data from Italy.

Lombardy is Italy’s richest and most developed region, and the second richest (in absolute and PPP GDP) administrative region in Europe after the Ile de France (source). It has a rather good health care system. In 2018, on average, 273 people died per day in Lombardy of all causes (source). Yesterday, 381 people died in Lombardy with coronavirus (source). This is spread out over a region with more than 10 million residents.

Some areas are harder-hit hotspots. Three days ago, a Bergamo newspaper reported that 330 people had died in the previous week of all causes in the city. In the same week of March in 2019, 23 people had died. That’s a 14x increase of mortality of all causes. Edited to add (3/22/2020): the mayor of Bergamo told Reuters that 164 people died in Bergamo of all causes in the first two weeks of March 2020, versus 56 in the first two weeks of March 2019, a 3x increase instead of the 14x increase reported by Bergamo News.

Bergamo’s hospital had 16 beds in its intensive care unit, in line with international standards (it is typical to have of the order of an ICU bed per 5000-10,000 people, and Bergamo has a population of 120,000). Right now there are 80 people in intensive care in Bergamo, a 5x increase in capacity that was possible by bringing in a lot of ventilators and moving other sick people to other hospitals. Nonetheless, there have been reports of shortages of ICU beds, and of people needing to intubated that could not be. There are also reports of people dying of pneumonia at home, without being tested.

Because of this surge in deaths, Bergamo’s funeral homes have not been able to keep up. It’s not that they have not been able to keep up with arranging funerals, because funerals are banned. They just do not have the capacity to perform the burials.

So coffins have been accumulating. A couple of days ago, a motorcade of army vehicles came to Bergamo to pick up 70 coffins and take them to other cities.

It should be noted that this is happening after 20 days of “social distancing” measures and after 13 days of “sheltering at home” in Lombardy.

My point being, if we had not known that a news virus was going around, the number of excess deaths in Bergamo would have not been hidden by the random noise in the number of deaths due to influenza-like illness.

Lies, damn lies, and covid-19

In the past two weeks, in Italy, we have been drowning in information about the novel coronavirus infection, but the statistics that have been circulating were lacking proper context and interpretation. Is covid-19 just a stronger form of the flu or is it a threat to the world economy? Yes.

Now that the first community transmissions are happening in my adopted home in the San Francisco Bay Area, I would like to relay to my American readers what I learned from the Italian experience.

Continue reading

Lies, Damns Lies, and Herbert London

I am grading the final projects of my class, I am trying the clear the backlog of publishing all the class notes, I am way behind on my STOC reviews, and in two days I am taking off for a complicated two-week trips involving planes, trains and a rented automobile, as well as an ambitious plan of doing no work whatsoever from December 20 to December 31.

So, today I was browsing Facebook, and when I saw a post containing an incredibly blatant arithmetic mistake (which none of the several comments seemed to notice) I spent the rest of the morning looking up where it came from.

The goal of the post was to make the wrong claim that people have been paying more than enough money into social security (through payroll taxes) to support the current level of benefits. Indeed, since the beginning, social security has been paying individuals more than they put in, and now that population and salaries have stop growing, social security is also paying out retired people more than it gets from working people, so that the “trust fund” (whether one believes it is a real thing or an accounting fiction) will run out in the 2030s unless some change is made.

This is a complicated matter, but the post included a sentence to the extent that $4,500 a year, with an interest of 1% per year “compounded monthly”, would add up to $1,3 million after 40 years. This is not even in the right order of magnitude (it adds up to about $220k) and it should be obvious without making the calculation. Who would write such a thing, and why?

My first stop was a July 2012 post on snopes, which commented on a very similar viral email. Snopes points out various mistakes (including the rate of social security payroll taxes), but the calculation in the snopes email, while based on wrong assumptions, has correct arithmetic: it says that $4,500 a year, with a 5% interest, become about $890k after 49 years.

So how did the viral email with the wrong assumptions and correct arithmetic morph into the Facebook post with the same wrong assumptions but also the wrong arithmetic?

I don’t know, but here is an August 2012 post on, you can’t make this stuff up, Accuracy in Media, which wikipedia describes as a “media watchdog.”

The post is attributed to Herbert London, who has PhD from Columbia, is a member of the Council on Foreign Relation and used to be the president of a conservative think-tank. Currently, he has an affiliation with King’s College in New York. London’s post has the sentence I saw in the Facebook post:

(…) an employer’s contribution of $375 per month at a modest one percent rate compounded over a 40 year work experience the total would be $1.3 million.

The rest of the post is almost identical to the July 2012 message reported by Snopes.

Where did Dr. London get his numbers? Maybe he compounded this hypothetical saving as 1% per month? No, because that would give more than $4 million. One does get about $1.3 million if one saves $375 a month for thirty years with a return of 1% per month, though.

Perhaps a more interesting question is why this “fake math” is coming back after five years. In 2012, Paul Ryan put forward a plan to “privatize” Social Security, and such a plan is now being revived. The only way to sell such a plan is to convince people that if they saved in a private account the amount of payroll taxes that “goes into” Social Security, they would get better benefits. This may be factually wrong, but that’s hardly the point.

An Alternative to the Seddighin-Hajiaghayi Ranking Methodology

[Update 10/24/14: there was a bug in the code I wrote yesterday night, apologies to the colleagues at Rutgers!]

[Update 10/24/14: a reaction to the authoritative study of MIT and the University of Maryland. Also, coincidentally, today Scott Adams comes down against reputation-based rankings]

Saeed Seddighin and MohammadTaghi Hajiaghayi have proposed a ranking methodology for theory groups based on the following desiderata: (1) the ranking should be objective, and based only on quantitative information and (2) the ranking should be transparent, and the methodology openly revealed.

Inspired by their work, I propose an alternative methodology that meets both criteria, but has some additional advantages, including having an easier implementation. Based on the same Brown University dataset, I count, for each theory group, the total number of letters in the name of each faculty member.

Here are the results (apologies for the poor formatting):

1 ( 201 ) Massachusetts Institute of Technology
2 ( 179 ) Georgia Institute of Technology
3 ( 146 ) Rutgers – State University of New Jersey – New Brunswick
4 ( 142 ) University of Illinois at Urbana-Champaign
5 ( 141 ) Princeton University
6 ( 139 ) Duke University
7 ( 128 ) Carnegie Mellon University
8 ( 126 ) University of Texas – Austin
9 ( 115 ) University of Maryland – College Park
10 ( 114 ) Texas A&M University
11 ( 111 ) Northwestern University
12 ( 110 ) Stanford University
13 ( 108 ) Columbia University
14 ( 106 ) University of Wisconsin – Madison
15 ( 105 ) University of Massachusetts – Amherst
16 ( 105 ) University of California – San Diego
17 ( 98 ) University of California – Irvine
18 ( 94 ) New York University
19 ( 94 ) State University of New York – Stony Brook
20 ( 93 ) University of Chicago
21 ( 91 ) Harvard University
22 ( 91 ) Cornell University
23 ( 87 ) University of Southern California
24 ( 87 ) University of Michigan
25 ( 85 ) University of Pennsylvania
26 ( 84 ) University of California – Los Angeles
27 ( 81 ) University of California – Berkeley
28 ( 78 ) Dartmouth College
29 ( 76 ) Purdue University
30 ( 71 ) California Institute of Technology
31 ( 67 ) Ohio State University
32 ( 63 ) Brown University
33 ( 61 ) Yale University
34 ( 54 ) University of Rochester
35 ( 53 ) University of California – Santa Barbara
36 ( 53 ) Johns Hopkins University
37 ( 52 ) University of Minnesota – Twin Cities
38 ( 49 ) Virginia Polytechnic Institute and State University
39 ( 48 ) North Carolina State University
40 ( 47 ) University of Florida
41 ( 45 ) Rensselaer Polytechnic Institute
42 ( 44 ) University of Washington
43 ( 44 ) University of California – Davis
44 ( 44 ) Pennsylvania State University
45 ( 40 ) University of Colorado Boulder
46 ( 38 ) University of Utah
47 ( 36 ) University of North Carolina – Chapel Hill
48 ( 33 ) Boston University
49 ( 31 ) University of Arizona
50 ( 30 ) Rice University
51 ( 14 ) University of Virginia
52 ( 12 ) Arizona State University
53 ( 12 ) University of Pittsburgh

I should acknowledge a couple of limitations of this methodology: (1) the Brown dataset is not current, but I believe that the results would not be substantially different even with current data, (2) it might be reasonable to only count the letters in the last name, or to weigh the letters in the last name by 1 and the letters in the first name by 1/2. If there is sufficient interest, I will post rankings according to these other methodologies.

Lies, Damn Lies, and Predictions

Oh man, not another election! Why do we have to choose our leaders? Isn’t that what we have the Supreme Court for?
— Homer Simpson

Nate Silver is now putting Barak Obama’s chance of reelection at around 85%, and he has been on the receiving end of considerable criticism from supporters of Mitt Romney. Some have criticized his statistical analysis by pointing out that he has a soft voice and he is not fat (wait, what? read for yourself – presumably the point is that Silver is gay and that gay people cannot be trusted with such manly pursuits as statistics), but the main point seems to be: if Romney wins the election then Silver and his models are completely discredited. (E.g. here.) This is like someone saying that a die has approximately a 83% probability of not turning a 2, and others saying, if I roll a die and it turns a 2, this whole “probability” thing that you speak of is discredited.

But still, when someone offers predictions in terms of probability, rather than simply stating that a certain outcome is more likely, how can we evaluate the quality of such predictions?

In the following let us assume that we have a sequence of binary events, and that each event i has a probability p_i of occurring as a 1 and 1-p_i of occurring as 0. A predictor gives out predicted probabilities q_i, and then events E_i happen. Now what? How would we score the predictions? Equivalently, how would we fairly compensate the predictor?

A simple way to “score” the prediction is to say that for each event we have a “penalty” that is |E_i - p_i|, or a score that is 1- |E_i - p_i|. For example, the prediction that the correct event happens with 100% probability gets a score of 1, but the prediction that the correct event happens with 85% probability gets a score of .85.

Unfortunately this scoring system is not “truthful,” that is, it does not encourage the predictor to tell us the true probabilities. For example suppose that a predictor has computed the probability of an event as 85% and is very confident in the accuracy of the model. Then, if he publishes the accurate prediction he is going to get a score of .85 with probability .85 and a score .15 with probability .15. So he is worse off than if he had published the prediction of the event happening with probability 100%, in which case the expected score is .85. In general, the scheme makes it always advantageous to round the probability to 0% or 100%.

Is there a truthful scoring system? I am not sure what the answer is.

If one is scoring multiple predictions of independent events, one can look at all the cases in which the prediction was, say, in the range of 80% to 90%, and see if indeed the event happened, say, a fraction between 75% and 95% of the times, and so on.

One disadvantage of this approach is that it seems to require a discretization of the probabilities, which seems like an arbitrary choice and one that could affect the final score quite substantially. Is there a more elegant way to score multiple independent events without resorting to discretization? Can it be proved to be truthful?

Another observation is that such an approach is still not entirely truthful if it is applied to events that happen sequentially. Indeed, suppose that we have a series of, say, 10 events for which we predicted a 60% probability of a 1, and the event 1 happened 7 out of 10 times. Now we have to make a prediction of a new event, for which our model predicts a 10% probability. We may then want to publish a 60% prediction, because this will help even out the “bucket” of 60% predictions.

I don’t think that there is any way around the previous problem, though it seems clear that it would affect only a small fraction of the predictions. (The complexity theorists among the readers may remember similar ideas being used in a paper of Feigenbaum and Fortnow.)

Surely the task of scoring predictions must have been studied in countless papers, and the answers to the above questions must be well known, although I am not sure what are the right keywords to use to search for such work. In computer science, there are a lot of interesting results about using expert advice, but they are all concerned with how you score your own way of picking which expert to trust rather than the experts themselves. (This means that the predictions of the experts are not affected by the scoring system, unlike the setting discussed in this post.)

Please contribute ideas and references in the comments.

The New York Times on False Positives and False Negatives

From this New York Times article:

Researchers found the home test accurate 99.98 percent of the time for people who do not have the virus. By comparison, they found it to be accurate 92 percent of the time in detecting people who do. […]

So, while only about one person in 5,000 would get a false negative test, about one person in 12 could get a false positive.

Lies, Damn Lies, and Spaceflight Safety

From an interview with Ed Mango, head of NASA’s commercial crew program, in which he discusses safety requirements for commercial entities who want to subcontract flights to the ISS from NASA.

Chaikin: And the probability of “loss of crew” has to be better than 1 in 1000?

Mango: Yes and no. What we’ve done is we’ve separated those into what you need for ascent and what you need for entry. For ascent it’s 1 in 500, and independently for entry it’s 1 in 500. We don’t want industry … to [interpret the 1-in-1,000 requirement] to say, “We’ve got a great ascent; we don’t need as much descent protection.” In reality we’ve got to protect the life of the crew all the time.

Now [the probability for] the mission itself is 1 in 270. That is an overall number. That’s loss of crew for the entire mission profile, including ascent, on-orbit, and entry. The thing that drives the 1 in 270 is really micrometeorites and orbital debris … whatever things that are in space that you can collide with. So that’s what drops that number down, because you’ve got to look at the 210 days, the fact that your heat shield or something might be exposed to whatever that debris is for that period of time. NASA looks at Loss of Vehicle the same as Loss of Crew. If the vehicle is damaged and it may not be detected prior to de-orbit, then you have loss of crew.

What does “yes” mean in the “yes and no” answer? Also, with a 1/500 probability of accident at takeoff and an independent 1/500 probability of accident at landing, we are already at a 1/250.2 probability of accident, so how do we get to 1/270 after adding accidents in mid-flight?

In not entirely unrelated news, a member of the board of Florida’s 3rd district took, and failed, the Florida Comprehensive Assessment Test (FCAT), a standardized test, as documented in two posts on the Washington Post blog.

Relevant quote:

“I won’t beat around the bush. The math section had 60 questions. I knew the answers to none of them, but managed to guess ten out of the 60 correctly. On the reading test, I got 62% . In our system, that’s a ‘D,’ and would get me a mandatory assignment to a double block of reading instruction.

“It seems to me something is seriously wrong. I have a bachelor of science degree, two masters degrees, and 15 credit hours toward a doctorate. I help oversee an organization with 22,000 employees and a $3 billion operations and capital budget, and am able to make sense of complex data related to those responsibilities….

Here is a sample of the math portion of the 10th grade FCAT, the most advanced one. Sample question: “An electrician charges a $45 fee to make a house call plus an hourly rate for labor. If the electrician works at one house for 3 hours and charges $145.50 for the job, what is the electrician’s hourly rate?” You can use a calculator.

Lies, Damn Lies, and the Royal Society Report

A few days ago, the Royal Society released a report on “Global Scientific Collaboration in the 21st Century.” Usually, when a document has “in the 21st Century” in the title, it can only go downhill from there. (I once had to review a paper that started with “As we enter the second decade of the 21st Century…” and, hard as it is to believe, it did go further downhill from there.) But the Royal Society is one of the most hallowed of scientific institutions, so one might have still hoped for the best.

The report was widely quoted in the press as predicting that China would overtake the United States in scientific output by 2013.

Indeed, in Section 1.6 (pages 42-43), the report uses data provided by Elsevier to estimate the number of scientific papers produced in various countries. We’ll skip the objection that the number of papers is a worthless measure of scientific output and go to figure 1.6 in the report, reproduced below.

The figure plots the percentage of scientific papers coming out of various countries, and then proceeds to do a linear interpolation of the percentages to create a projection for the future.

While such an approach shows China overtaking the US in 2013, it also shows, more ominously, China publishing 110% of all scientific papers by 2100. (The report concedes that linear interpolation might not make a lot of sense, yet the picture is there.)