Lies, damn lies, and a Dutch trial

In CS70, the Berkeley freshman/sophomore class on discrete mathematics and probability for computer scientists, we conclude the section on probability with a class on how to lie with statistics. The idea is not to teach the students how to lie, but rather how not to be lied to. The lecture focuses on the correlation versus causation fallacy and on Simpson’s paradox.

My favorite way of explaining the correlation versus causation fallacy is to note that there is a high correlation between being sick and having visited a health care professional in the recent past. Hence we should prevent people from seeing doctors in order to make people healthier. Some HMOs in the US are already following this approach.

Today, a post in a New York Times science blog tells the story of a gross misuse of statistics in a Dutch trial that has now become a high-profile case. In the Dutch case two other, and common, fallacies have come up. One is, roughly speaking, neglecting to take a union bound. This is the fallacy of saying ‘I just saw the license plate California 3TDA614, what are the chances of that!’ The other is the computation of probabilities by making unwarranted independence assumptions.

Feynman has written eloquently about both, but I don’t have the references at hand. In particular, when he wrote on his Space Shuttle investigation committee work, he remarked that official documents had given exceedingly low probabilities of a major accident (of the order of one millionth per flight or less), even though past events have shown this probability to be more of the order of 1%. The low number was obtained by summing the probabilities of various scenarios, and the probability of each scenario was obtained by multiplying estimates for the probabilities that the various things that had to go wrong for that scenario to occur would indeed go wrong.

Christos Papadimitriou has the most delightful story on this fallacy. He mentioned in a lecture the Faloutsos-Faloutsos-Faloutsos paper on power law distributions in the Internet graph. One student remarked, wow, what are the chances of all the authors of a paper being called Faloutsos!


4 thoughts on “Lies, damn lies, and a Dutch trial

  1. Just today I bought a birthday card saying ‘Birthdays are good for you. Statistics show that those who have many of them live longer’. (Now what are the chances you’d blog about this exactly on the same day I bought this card!)

  2. Not to be an idiot, but could you expand on why the license plate line is a fallacy? I dont see the violation of a union bound there.

  3. I believe the point is that you always see some alphabet/number when you look at a license plate. It’s just that it happens to be California 3TDA614. But it can also be Florida 1BDG512 if you look at some other license plate.

  4. To make the license plate explanation complete one really ought to consider the Kolmogorov complexity of the plates. One might more appropriately consider it a surprise to see plate CA 0ABC123 (assuming it was not a vanity plate)? In contrast to this plate, both of the numbers mentioned have high Kolmogorov complexity which should not be a surprise since almost all plates do. However, low Kolmogorov complexity plates are rare and so should be a surprise.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s