A few weeks ago, the Proceedings of the National Academy of Science published an article on a study conducted by a group of Cornell researchers at Facebook. They picked about 600,000 users and then, for a week, a subset of them saw fewer “negative” posts (up to 90% were filtered) than they would otherwise see, a subset saw fewer “positive” posts (same), and a control group got a random subset.

After the week, the users in the “negative” group posted fewer, and more negative, posts, and those in the “positive” group posted more, and more positive, posts.

Posts were classified according to an algorithm called LIWC2007.

The study run contrary to a conventional wisdom that people find it depressing to see on Facebook good things happening to their friends.

The paper has caused considerable controversy for being a study with human subjects conducted without explicit consent. Every university, including of course Cornell, requires experiments involving people to be approved by a special committee, and participants must sign informed consent forms. Facebook maintains that the study is consistent with its terms of service. The highly respected privacy organization EPIC has filed a complaint with the FTC. (And they have been concerned with Facebook’s term of service for a long time.)

Here I would like to explore a different angle: almost everybody thinks that *observational* studies about human behavior can be done without informed consent. This means that if the Cornell scientists had run an analysis on old Facebook data, with no manipulation of the feed generation algorithm, there would not have been such a concern.

At the same time, the number of posts that are fit for the feed of a typical user vastly exceed what can fit in one screen, and so there are algorithms that pick a rather small subset of posts that are evaluated to be of higher relevance, according to some scoring function. Now suppose that, if N posts fit on the screen, the algorithm picks the 2N highest scoring posts, and then randomly picks half of them. This seems rather reasonable because the scoring function is going to be an approximation of relevance anyway.

The United States has roughly 130 million Facebook subscriber. Suppose that the typical user looks, in a week, at 200 posts, which seems reasonable (in our case, those would be a random subset of roughly 400 posts). According to the PNAS study, roughly 50% of the posts are positive and 25% are negative, so of the initial 400, roughly 200 are positive and 100 are negative. Let’s look at the 100,000 users for which the random sampling picked the fewest positive posts: we would be expecting roughly 3 standard deviations below the mean, so about 80 positive posts instead of the expected 100; the 100,000 users with the fewest negative posts would get about 35 instead of the expected 50.

This is much less variance than in the PNAS study, where they would have got, respectively, only 10 positive and only 5 negative, but it may have been enough to pick up a signal.

Apart from the calculations, which I probably got wrong anyway, what we have is that in the PNAS study they picked a subset of people and then they varied the distribution of posts, while in the second case you pick random posts for everybody and then you select the users with the most variance.

If you could arrange distributions so that *the distributions of posts seen by each users are the same*, would it really be correct to view one study as experimental and one as observational? If the PNAS study had filtered 20% instead of 90% of the positive/negative posts, would it have been ethical? Does it matter what is the *intention* when designing the randomized algorithm that selects posts? If Facebook were to introduce randomness in the scoring algorithm with the goal of later running observational studies would it be ethical? Would they need to let people opt out? I genuinely don’t know the answer to these questions, but I haven’t seen them discussed elsewhere.