You are currently browsing the category archive for the 'lies damn lies and ...' category.
A few days ago, Gina Kolata reported in the New York Times on the paradox of studies on sexual behavior consistently reporting (heterosexual) men having more sexual partners than women, with a recent US study reporting men having a median number of 7 partners and women a median number of 4. Contrary to what’s stated in the paper, this is not mathematically impossible (key word: median). It is however quite implausible, requiring a relatively small number of women to account for a large fraction of all men’s partners.
An answer to this paradox can be found in Truth and consequences: using the bogus pipeline to examine sex differences in self-reported sexuality, by Michele Alexander and Terry Fisher, Jorunal of Sex Research 40(1), February 2003.
In their study, a sample of men and women are each divided into three groups and asked to fill a survey on sexual behavior. People in one group filled the survey alone in a room with an open door, a researcher sitting outside, and after being told the study was not anonymous; people in a second group filled the survey in a room with a closed door and an explicit assurance of anonymity; people in a third group filled the survey attached to what they believe to be a working “lie detector.”
In the first group, women reported on average 2.6 partners, men 3.7. In the second group, it was women 3.4 and men 4.2. In the third group, it was women 4.4 and men 4.0.
(The study looks at several other quantities, and some of them have even wider variance in the three settings.)
So, not surprisingly given the sexual double standards in our culture, men and women lie about their sexual behavior (men overstate, women understate), and do less so in an anonymous setting or when the lie is likely to be discovered.
Here is the reporting of the first group put to music:
[Update 8/18/07: so many people must have emailed her about the median versus average issue in the article that Gina Kolata wrote a clarification. Strangely, she does not explain, for the rest of the readers, what the difference is and why it is possible, if unlikely, to have very different medians for men and women. The claim in the clarification, by the way, is still wrong: those 9.4% of women with 15 or more partners could be accounting for all the missing sex.]
In CS70, the Berkeley freshman/sophomore class on discrete mathematics and probability for computer scientists, we conclude the section on probability with a class on how to lie with statistics. The idea is not to teach the students how to lie, but rather how not to be lied to. The lecture focuses on the correlation versus causation fallacy and on Simpson’s paradox.
My favorite way of explaining the correlation versus causation fallacy is to note that there is a high correlation between being sick and having visited a health care professional in the recent past. Hence we should prevent people from seeing doctors in order to make people healthier. Some HMOs in the US are already following this approach.
Today, a post in a New York Times science blog tells the story of a gross misuse of statistics in a Dutch trial that has now become a high-profile case. In the Dutch case two other, and common, fallacies have come up. One is, roughly speaking, neglecting to take a union bound. This is the fallacy of saying ‘I just saw the license plate California 3TDA614, what are the chances of that!’ The other is the computation of probabilities by making unwarranted independence assumptions.
Feynman has written eloquently about both, but I don’t have the references at hand. In particular, when he wrote on his Space Shuttle investigation committee work, he remarked that official documents had given exceedingly low probabilities of a major accident (of the order of one millionth per flight or less), even though past events have shown this probability to be more of the order of 1%. The low number was obtained by summing the probabilities of various scenarios, and the probability of each scenario was obtained by multiplying estimates for the probabilities that the various things that had to go wrong for that scenario to occur would indeed go wrong.
Christos Papadimitriou has the most delightful story on this fallacy. He mentioned in a lecture the Faloutsos-Faloutsos-Faloutsos paper on power law distributions in the Internet graph. One student remarked, wow, what are the chances of all the authors of a paper being called Faloutsos!
Some time ago, the New York Times reported on census data that shows that only a minority of American women are married and living with their husband. Thomas Sowell writes in National Review to complain about the way the Times misleads with statistics. He repeats points made earlier, in the same magazine, by Jennifer Morse. (Namely, that the claim depends on the definition of “woman” and of “living with.”)
But this is part of a pattern, Mr. Sowell writes, because,
Innumerable sources have quoted a statistic that half of all marriages end in divorce — another conclusion based on creative manipulation of words, rather than on hard facts.
The statistic is partly based on the fact that, in recent years, there have been about half as many divorces as marriages in any given year. It is of course not quite correct to project that half of the marriages are going to end in divorce: if the number of people getting married increases with time then, all other things being equal, the ratio of divorces to marriages in a given year underestimates the true fraction of marriages ending in divorce. Conversely, if the number of marriages goes down with time, one has an overestimate. I would suppose, however, that demographers take such trends into account in their models.
Sowell’s objection is, of course, considerably more creative:
The fact that there may be half as many divorces in a given year as there are marriages in that year does not mean that half of all marriages end in divorce.It is completely misleading to compare all the divorces in one year — from marriages begun years and even decades earlier — with the number of marriages begun in that one year.
The Lancet has recently published a study of the number of deaths in Iraq caused by the invasion. From the abstract:
Data from 1,849 households that contained 12,801 individuals in 47 clusters was gathered. 1,474 births and 629 deaths were reported during the observation period. Pre-invasion mortality rates were 5.5 per 1000 people per year (95\% CI 4.3–7.1), compared with 13.3 per 1000 people per year (10.9–16.1) in the 40 months post-invasion. We estimate that as of July, 2006, there have been 654,965 (392,979–942,636) excess Iraqi deaths as a consequence of the war, which corresponds to 2.5\% of the population in the study area. Of post-invasion deaths, 601,027 (426,369–793,663) were due to violence, the most common cause being gunfire.
Author Mark Goldblatt notes in an article on National Review that, in these calculations, one should also consider the fact that, before the invasion, Iraq was subject to sanctions that were lifted after the American occupation. By some estimates, the sanctions were causing about 150,000 deaths a year. This means that, since 2003, about 450,000 deaths might have been avoided because of the end of the sanctions.
Considering that one would have expected 450,000 fewer deaths, and one gets instead 650,000 more, the conclusion would be that the extra deaths caused by the occupation would be in excess of a million. Of course, simply adding the two numbers is problematic for a few reasons: some of the effects of the sanctions (for example, on the health care system) may be similar to the effects of the occupation, and hence would be having similar (rather than nearly disjoint) effects. Most importantly, it has been alleged that the estimates on deaths caused by the sanctions were overstated.
Anyways, what is Goldblatt approach? He subtracts one number from the other! You know how this works: I owe you \$20 dollars, now lend me another \$30 and I will give you the \$10 difference tomorrow. If I may suggest an improvement to his methodology, he should also subtract the number of deaths that occurred in Switzerland over the same period of time. I am sure he would get even more accurate estimate.
Update: see also Tim Lambert at scienceblogs.

Recent Comments