Lies, Damn Lies, and the Royal Society Report

A few days ago, the Royal Society released a report on “Global Scientific Collaboration in the 21st Century.” Usually, when a document has “in the 21st Century” in the title, it can only go downhill from there. (I once had to review a paper that started with “As we enter the second decade of the 21st Century…” and, hard as it is to believe, it did go further downhill from there.) But the Royal Society is one of the most hallowed of scientific institutions, so one might have still hoped for the best.

The report was widely quoted in the press as predicting that China would overtake the United States in scientific output by 2013.

Indeed, in Section 1.6 (pages 42-43), the report uses data provided by Elsevier to estimate the number of scientific papers produced in various countries. We’ll skip the objection that the number of papers is a worthless measure of scientific output and go to figure 1.6 in the report, reproduced below.

The figure plots the percentage of scientific papers coming out of various countries, and then proceeds to do a linear interpolation of the percentages to create a projection for the future.

While such an approach shows China overtaking the US in 2013, it also shows, more ominously, China publishing 110% of all scientific papers by 2100. (The report concedes that linear interpolation might not make a lot of sense, yet the picture is there.)

About these ads

17 thoughts on “Lies, Damn Lies, and the Royal Society Report

  1. After the former president of the same Royal Society wrote (in a book commissioned for their 350th anniversary) that AKS can factor large primes..and break crypto, I knew nothing good was coming from there :)

  2. If indeed “the number of papers is a worthless measure of scientific output”, why do you use this measure in the US?
    And also, maybe for China to beat the US will take more time than 2013, but as for the UK and Germany, it is already there.

  3. By Goodhart’s law, virtually any surrogate measure of “scientific output” will become useless once the US, Chinese, and other governments start to care about it. Number of papers/citations is no exception.

  4. I think it might be worth quoting here exactly what the report says. (I had nothing to do with it, by the way, and it seems that neither did any other mathematicians.)

    “In terms of publications, the landscape is set to change even more dramatically if current trends continue, as can be seen in Figure 1.6. China has already overtaken the UK as the second leading producer of research publications, but some time before 2020 it is expected to surpass the USA. Projections vary, but a simple linear interpretation [sic] of Elsevier’s publishing data suggests that this could take place as early as 2013. Of course, in practice, this will not follow a linear progression (we do not expect that the USA will decrease their share of global publications to nothing in the next 50 years), but the potential for China to match US output in terms of sheer numbers in the near to medium term is clear.”

    I don’t see anything to disagree with there, though perhaps they should have been aware that the media would pick up on the 2013 projection and leave out all the qualifications that surround it.

  5. Some time ago, a similar fact happened in Switzerland in a much more serious context.

    The Swiss right-wing party (the SVP) used in one of their anti-immigration political campaigns the fact that the Muslim portion of the population doubled every 10 years in the last 20 years. From this fact, they extrapolated that it will also double every 10 years in the years to come. Interestingly, the graph (which was published as an ad in all major newspapers) stopped in 2050, indicating that 72% of the population will be Muslim by then. The fact that this would be true for 144% of the population in 2060 was not shown…

    A very bad scan of the ad (for those understanding German, though the plot is easy to read) can be found at http://www.popstar.ch/wordpressb/wp-content/uploads/2009/12/img2.jpg

  6. First, I would like to suggest that the negative future output of the US is possible since a retraction really should be counted as a negative paper.

    Secondly, here is a Nature paper that (I belive, though I cannot double check because it is behind a paywall now) makes a similar mistake—they run a linear regression on log-log normalized data.

    http://www.nature.com/nature/journal/v411/n6840/full/411907a0.html

    (Nature later published a “comment” and a rejoinder).

    Finally, I would argue that there is something objectionable in the text “a simple linear interpretation [sic] of Elsevier’s publishing data suggests…” because such a linear extrapolation does not and cannot suggest anything about the data as it is silly to run in the first place. Running such extrapolation might suggest two things: the author was too lazy to do something more reasonable, or more positively, that the author just thought it was funny.

  7. This is a bit off-topic: is there a specific reason for the ‘jump’ in the period around 2003-05 (say, a change in what Elsevier counts)?

  8. I agree that a linear extrapolation is silly, but over a shortish period of time it could conceivably give results that are not too ridiculous. I think what bothers me more is that they take the instantaneous gradients, which happen to have become a bit steeper at the very last minute for both the US and China, rather than averaging the gradient over a longer time scale, which would then lead to a crossing point significantly further in the future (which in turn would make the linear extrapolation that much less reliable). I suppose another point one could make is that the same technique applied in say 2004 would have predicted a crossing point in something like 2008.

  9. According to my high-tech study consisting of taking a ruler to the screen, I think that they take the line connecting the most recent data point with the 1996 data point, and then extend it to the right.

    There are more sensible ways of fitting a line to data (e.g. minimizing the sum of square distances) that taking the line passing through the two furthest points, and this is actually a good explanation of why extrapolating from percentages is “wrong” even for short periods: more accurate fittings would actually not guarantee that the projected percentages add up to 100% (their two-point fit does guarantee that percentages add up to 100%), which is one of the basic properties that one would like from a forecast about percentages.

  10. I do not see any serious methodological problems with this chart. But of course projection is projection, it is not very reliable.

    In theory and in fact, the Y scale ranges from 0 to 100. It is percentage, not your salary. The projection is made within this range. So the negative figure and 144% you mentioned will never ever happen. If 0 paper comes out of the US since 2050, then the line will keep flat at 0 level.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s