not backward!

Oh man, not another election! Why do we have to choose our leaders? Isn’t that what we have the Supreme Court for?
– Homer Simpson

Nate Silver is now putting Barak Obama’s chance of reelection at around 85%, and he has been on the receiving end of considerable criticism from supporters of Mitt Romney. Some have criticized his statistical analysis by pointing out that he has a soft voice and he is not fat (wait, what? read for yourself – presumably the point is that Silver is gay and that gay people cannot be trusted with such manly pursuits as statistics), but the main point seems to be: if Romney wins the election then Silver and his models are completely discredited. (E.g. here.) This is like someone saying that a die has approximately a 83% probability of not turning a 2, and others saying, if I roll a die and it turns a 2, this whole “probability” thing that you speak of is discredited.

But still, when someone offers predictions in terms of probability, rather than simply stating that a certain outcome is more likely, how can we evaluate the quality of such predictions?

In the following let us assume that we have a sequence of binary events, and that each event $i$ has a probability $p_i$ of occurring as a $1$ and $1-p_i$ of occurring as $0$. A predictor gives out predicted probabilities $q_i$, and then events $E_i$ happen. Now what? How would we score the predictions? Equivalently, how would we fairly compensate the predictor?

A simple way to “score” the prediction is to say that for each event we have a “penalty” that is $|E_i - p_i|$, or a score that is $1- |E_i - p_i|$. For example, the prediction that the correct event happens with 100% probability gets a score of 1, but the prediction that the correct event happens with 85% probability gets a score of .85.

Unfortunately this scoring system is not “truthful,” that is, it does not encourage the predictor to tell us the true probabilities. For example suppose that a predictor has computed the probability of an event as 85% and is very confident in the accuracy of the model. Then, if he publishes the accurate prediction he is going to get a score of .85 with probability .85 and a score .15 with probability .15. So he is worse off than if he had published the prediction of the event happening with probability 100%, in which case the expected score is .85. In general, the scheme makes it always advantageous to round the probability to 0% or 100%.

Is there a truthful scoring system? I am not sure what the answer is.

If one is scoring multiple predictions of independent events, one can look at all the cases in which the prediction was, say, in the range of 80% to 90%, and see if indeed the event happened, say, a fraction between 75% and 95% of the times, and so on.

One disadvantage of this approach is that it seems to require a discretization of the probabilities, which seems like an arbitrary choice and one that could affect the final score quite substantially. Is there a more elegant way to score multiple independent events without resorting to discretization? Can it be proved to be truthful?

Another observation is that such an approach is still not entirely truthful if it is applied to events that happen sequentially. Indeed, suppose that we have a series of, say, 10 events for which we predicted a 60% probability of a 1, and the event 1 happened 7 out of 10 times. Now we have to make a prediction of a new event, for which our model predicts a 10% probability. We may then want to publish a 60% prediction, because this will help even out the “bucket” of 60% predictions.

I don’t think that there is any way around the previous problem, though it seems clear that it would affect only a small fraction of the predictions. (The complexity theorists among the readers may remember similar ideas being used in a paper of Feigenbaum and Fortnow.)

Surely the task of scoring predictions must have been studied in countless papers, and the answers to the above questions must be well known, although I am not sure what are the right keywords to use to search for such work. In computer science, there are a lot of interesting results about using expert advice, but they are all concerned with how you score your own way of picking which expert to trust rather than the experts themselves. (This means that the predictions of the experts are not affected by the scoring system, unlike the setting discussed in this post.)

Last Fall, three Stanford classes were “offered online” for free: Andrew Ng’s machine learning class, Sebastian Thrun’s AI class, and Jennifer Widom’s data base class. There had been interest and experiments in online free education for a long time, with the MITx initiative being a particularly significant one, but there were a few innovations in last year’s Stanford classes, and they probably contributed to their runaway success and six-digit enrollment.

One difference was that they did not post videos of the in-class lectures. There was, in fact, no in-class lecture. Instead, they taped short videos, rehearsed and edited, with the content of a standard 90-minute class broken down in 4 ten-minutes video or so. This is about the difference between taping a play and making a movie. Then the videos came with some forms of “interactivity” (quizzes that had to be answered to continue), and they were released at the rate in which the class progressed, so that there was a community of students watching the videos at the same time and able to answer each other’s questions in forums. Finally, the videos were used in the Stanford offerings of the classes: the students were instructed to watch the videos by themselves, and during the lecture time they would solve problems, or have discussions or have guest lectures and so on. (In K-12 education, this is called the “flipped classroom” model, in which students take lectures at home and solve homeworks in class, instead of the traditional other way around.)

In the past few months, there has been a lot of thinking, and a lot of acting, about the success of this experiment. Sebastian Thrun started a company called udacity to offer online courses “branded” by the company itself, and Daphne Koller and Andrew Ng started a company called coursera to provide a platform for universities to put their courses online, and, meanwhile, Harvard and Berkeley joined MIT to create edX.

At a time when the growth of higher education costs in the United States appear unsustainable, particularly in second-tier universities, and when the demand for high-quality higher education is exploding in the developing world, these projects have attracted a lot of interest.

While the discussion has been focused on the “summer blockbusters” of higher education, and what they should be like, who is going to produce them, how to make money from them, and so on, I would like to start a discussion on the “art house” side of things.

In universities all over the world, tens of thousands of my colleagues, after they have “served” their departments teaching a large undergraduate classes and maybe a required graduate class, get to have fun teaching a research-oriented graduate class. Their hard-earned insights into problems about which they are the world’s leading expert, be it a particular organ of the fruit fly or a certain corner of the Langlands program, are distilled into a series of lectures featuring content that cannot be found anywhere else. All for the benefit of half a dozen or a dozen students.

If these research-oriented, hyper-specialized courses were available online, those courses might have an audience of 20 or 30 students, instead of 100,000+, but their aggregate effect on their research communities would be, I believe, very significant.

One could also imagine such courses being co-taught by people at different universities. For example, imagine James Lee and Assaf Naor co-teaching a course on metric embeddings and approximation algorithms: they would devise a lesson plan together, each would produce half of the videos, and then at both NYU and UW the students would watch the videos and meet in class for discussions and working on problems; meanwhile study groups would probably pop up in many theory groups, of students watching the videos and working on the problem sets together.

So someone should put a research-oriented graduate course online, and see what happens. This is all to say that I plan to teach my class on graph partitioning, expander graphs, and random walks online in Winter 2013. Wish me luck!

I would like to thank all those that contributed to the Turing Centennial series: all those who wrote posts, for sure; but also all those who spread the word about this, on blogs, twitter, facebook, and in real life; those who came to read them; and those who wrote lots of thoughtful comments. In a community where discussions over conference organizational issues or over the importance of matrix multiplication algorithms can become very acrimonious, I am impressed that we could have such a pleasant and troll-free conversation. This goes to show that in theory has not only the smartest and most handsome readers of the whole internet, as was well known, but also the nicest ones!

We will definitely do this again in 2054, to mark the centennial of Turing’s death.

A few days ago, I was very saddened to hear of the death of Sally Ride. A Stanford Alumna, Sally Ride became to first American woman to travel in space, she served on both the investigative committees after the two Shuttle disasters, and dedicated the past decade to the goal of getting young kids, and girls in particular, interested in science and technology. She cofounded, and directed, a non-profit foundation to further these goals, and wrote several books. After her death, it was revealed that she had been in a 25-year relationship with another woman, who was also the coauthor of her books and a partner in her foundation.

I think it is significant that a person that certainly had a lot of courage, determination, willingness to defy stereotypes, and to be an inspiration for people like her, felt that she could not be publicly out during her life. (In interviews about their books, Ride and her partner Tam O’Shaughnessy referred to each other as “friends”.)

Let’s hope that in 2054 it’s not just computer science professors in the West that are confortable being out, but also astronauts, movie stars, professional athletes, and so on.

[Leaving the best for last, here is Ashwin Nayak's post. Unlike the other posts in this series, Ashwin does not just talk about events, but he also gives us a view of his inner life at several critical times. What can I say to introduce such a beautiful essay? I got this: congratulations Ashwin! -- L.T.]

(Some names have been changed to protect privacy. Some events have been presented out of chronological order, to maintain continuity in the narrative. The unnamed friends in Waterloo are Kimia, Andrew, Anna-Marie, and Carl. I would like to thank them, Joe, Luca, and especially Harry for their feedback on a draft of this blog post. Harry offered meticulous comments, setting aside a myriad commitments. Most of all, I would like to thank my sisters and my parents for graciously agreeing to being included in this story.

For those not in theoretical computer science, FOCS is one of the flagship conferences on this subject. Luca is a professor of computer science at Stanford University, and Irit at Weizmann Institute of Science.

A prelude: I was born into a middle-class family from the South-West coast of India. I am the youngest of three siblings, and grew up in cities all over the country. My father served as an officer in the Indian army, and my mother taught in middle school until she switched to maintaining the household full-time. I went to IIT Kanpur for my undergraduate studies when I was 17. At 21, I moved half-way across the world to Berkeley, CA, for graduate studies. In 2002, after a few years of post-doctoral work in the US, I moved to Waterloo, ON, to take up a university faculty position.)

We were walking through art galleries in San Francisco when Luca brought up the Turing centenary events that were taking place around the world. None of the events celebrating his work referred to Turing’s homosexuality. Luca wondered whether the celebrations would be complete without revisiting this aspect of his life. As a response, he was thinking of having a series of guest blog posts by contemporary gay and lesbian computer scientists about their experiences as gay professionals. How would they compare with those in Turing’s times?

I wonder how much of my attention was on the art in the next few galleries. Would I write a post? What would I write? For me, sexuality is so deeply personal a matter that I’ve talked about it only with a handful of people. Why would I write about it publicly? Something Luca had said stuck in my mind: “The post could even be anonymous. That would be a statement in itself.” It took me back to my first relationship: I dated Mark for over three years and no one other than his friends knew. Times when I was on the verge of telling a friend about my relationships flashed by. I remembered the time I discussed with my immediate family why I would not get married (at least not the way they imagined). Times when students recognized me at events for gays and lesbians resurfaced, as did conversations with friends and colleagues grappling with openness. I would write a post, I told Luca.

That night, I got little sleep. Memories that I thought had slipped into oblivion loomed large. Read the rest of this entry »

[Rosario Gennaro is a cryptographer, and he has been at IBM for more than 15 years. (He must have started as a teen-ager.) On Monday, he will start his new job as a professor at the City College of New York and the Director of the Center for Algorithms and Interactive Scientific Software. In the middle of his move and of an internet-free vacation, Rosario found the time to write a guest post that goes in a quite different direction from the others. -- L.T.]

“David Hilbert … I suppose his name doesn’t mean much, if anything, to you? No, no? Well, there you are, you see? It’s the way of the world! People, never seem to hear about the really great mathematicians!”

The recent celebrations for Alan Turing’s centenary made me revisit the BBC movie of “Breaking the Code” that amazing Broadway play, with a wonderful Derek Jacobi playing Alan Turing. You can see the most astonishing bit of this play here:

a 6-minute tour de force monologue explaining in lay terms Godel's Theorem and Turing's discovery of undecidable problems.

But that's not what I decided to talk about. There is no question that Alan Turing's sexual orientation has played a huge role in the popularization of his figure and his work. "Breaking the Code" would not have been written if not for the unique personal story that accompanied Turing's exceptional contribution to Mathematics and Computer Science. Nor would NPR have run a story last month on the centenary. Neither Godel nor Hilbert (both mentioned in the above monologue) got such treatment.

While I wish that being gay were a sufficient condition for being a celebrated mathematician in the news (reserve space for my profile in the next issue of the New Yorker please), I wonder if being queer in some form is necessary. What can we do, as a community to make sure people know, not only Turing, but also Hilbert, and Godel, and Gauss. How can we make the Mathematics relevant, rather than the person. Can we get liberal arts majors, for example, to have a deep appreciations of the *ideas* and the *concepts* of Mathematics and Computer Science, even if they will never understand the proofs and the techniques? As I embark on an academic career after 16 years of research in a corporate lab, these questions have been occupying my mind. Others are wondering too …

Theoretical Computer Science, in my opinion, presents many opportunities on this front. Decidability, computational hardness, (pseudo)randomness … those are all concepts around which a philosophy class could be built. After all, as the fictional Turing says in the play, it's about telling right from wrong. I would love to develop such a class for liberal arts majors, and maybe the readers of "in theory" can help me by pointing me to similar classes that are already being taught somewhere. Yes, I am that lazy.

To finish off, being an opera queen (as any self-respecting homosexual should be) I have a not-so-secret wish to see "Breaking the Code" adapted into an opera. I think John Adams, whose work on physicist J. Robert Oppenheimer and the atomic bomb was mesmerizing, would be my top choice for a composer:

[Martin Farach-Colton is a professor at Rutgers, in the gayest computer science department in the country. He is well known for his work on algorithms and data structures. In the Fall of 1998, I was a post-doc at DIMACS and I lived in New York; since we had the same commute, I would sometimes get a ride from Martin. I was still quite new to the US, and I remember thinking it strange that Martin was the only person driving normally, while everybody else was going so slowly. Martin is the dean of out theoreticians, and he has written a very interesting post. I wish he hadn't given up so easily on the theme of sexism vs. homophobia. -- L.T.]

When Luca asked me to write a guest blog post on “Putting the Gay Back in the Turing Centennial”, I was happy to say yes. But I had a problem. If I were to write about being gay in the theory community, what could I write about? I’ve always been quite comfortable being openly gay in the theory community, and that doesn’t make for a very interesting story, does it?

But first, some context: I grew up in South Carolina, in an Argentine family. Both my family and my surroundings were deeply homophobic. When I moved away from home to go to medical school, I found myself in yet another very homophobic environment. Nonetheless, in 1986, I decided it was time to meet Mr. Right, and the first step was to come out to all my friends and family. Within 6 months I was living with Andrew. We’ll be celebrating 26 years together in a few months, as well as 9 years of marriage. Our twins are 12.

I wasn’t fully out at medical school, but when I started my PhD in Computer Science, I threw open the closet doors and was totally out from Day One. It would be years before I met another openly gay or lesbian computer scientist, and even more years before I knew of another LGBT theoretician. Yet I have found that being gay was no big deal within the theory community. Practically no one seems to care, and that’s the best kind of acceptance there is.

Remarkably, I felt this kind of open atmosphere at the very first FOCS I attended back in 1989. The world has changed a lot for gay people in the last 23 years, but the theory community changed earlier. Sure, people have said some homophobic things to me, but these were almost all minor incidents, and I’m also sure that those people would now be mortified by what they said 20 years ago. More often than not, when gay issues come up with my theory colleagues, they are mostly interested in topics like a technical analysis of how the fight for marriage equality is going. (I’ve been involved in this fight both here in the US — where there’s still plenty of work to be done — and in Argentina, which now has the most progressive LGBT laws in the world.)

What can explain the culture of the theory community? I turned to some of the women of my academic generation to see what it’s been like for them. After all, it seems that homophobia and sexism go hand in hand. Right off the bat, one of them torpedoed my premise. She pointed out that there have been plenty of gay men who are acknowledged as great geniuses. There is no stereotype to overcome with respect to being gay and being good at math. Indeed, in addition to Turing, Hardy was famously gay, as were Komogorov and his partner, the topologist Pavel Alexandrov. I’m not placing myself in such exalted company but merely pointing out that perhaps I had it easier than women in the field because I had fewer stereotypes to overcome.

I found general consensus that, although the theory community is not free of prejudice and stereotype, it’s a comfortable place for a lot of people. Perhaps it’s not just theory. My own department had, at its high-water mark, four openly gay faculty, two of whom were recruited as a couple. I also found Google very gay-friendly when I worked there in the early ’00s.

So, really, I feel like I have nothing substantive to say on the subject. And maybe the best news. To paraphrase Tolstoy, happiness is dull.

Two more posts are coming soon. Meanwhile, here is a wonderful interview with Robert MacPherson, which is part of an interview series by the Simons Foundation. Although the interview does not mention Turing, it does mention Kolmogorov.

(via Not Even Wrong)

From this New York Times article:

Researchers found the home test accurate 99.98 percent of the time for people who do not have the virus. By comparison, they found it to be accurate 92 percent of the time in detecting people who do. [...]

So, while only about one person in 5,000 would get a false negative test, about one person in 12 could get a false positive.

It is late Spring in 2000, and I am to have lunch in New York with Ran Canetti and Ronitt Rubinfeld. Ronitt is already there, and Ran arrives a bit late and asks what we are talking about. “I told Ronitt that I am gay” I say. “Oh…” says Ran “Congratulations!