The Facebook study, sampling, and the principle of delayed decision

A few weeks ago, the Proceedings of the National Academy of Science published an article on a study conducted by a group of Cornell researchers at Facebook. They picked about 600,000 users and then, for a week, a subset of them saw fewer “negative” posts (up to 90% were filtered) than they would otherwise see, a subset saw fewer “positive” posts (same), and a control group got a random subset.

After the week, the users in the “negative” group posted fewer, and more negative, posts, and those in the “positive” group posted more, and more positive, posts.

Posts were classified according to an algorithm called LIWC2007.

The study run contrary to a conventional wisdom that people find it depressing to see on Facebook good things happening to their friends.

The paper has caused considerable controversy for being a study with human subjects conducted without explicit consent. Every university, including of course Cornell, requires experiments involving people to be approved by a special committee, and participants must sign informed consent forms. Facebook maintains that the study is consistent with its terms of service. The highly respected privacy organization EPIC has filed a complaint with the FTC. (And they have been concerned with Facebook’s term of service for a long time.)

Here I would like to explore a different angle: almost everybody thinks that observational studies about human behavior can be done without informed consent. This means that if the Cornell scientists had run an analysis on old Facebook data, with no manipulation of the feed generation algorithm, there would not have been such a concern.

At the same time, the number of posts that are fit for the feed of a typical user vastly exceed what can fit in one screen, and so there are algorithms that pick a rather small subset of posts that are evaluated to be of higher relevance, according to some scoring function. Now suppose that, if N posts fit on the screen, the algorithm picks the 2N highest scoring posts, and then randomly picks half of them. This seems rather reasonable because the scoring function is going to be an approximation of relevance anyway.

The United States has roughly 130 million Facebook subscriber. Suppose that the typical user looks, in a week, at 200 posts, which seems reasonable (in our case, those would be a random subset of roughly 400 posts). According to the PNAS study, roughly 50% of the posts are positive and 25% are negative, so of the initial 400, roughly 200 are positive and 100 are negative. Let’s look at the 100,000 users for which the random sampling picked the fewest positive posts: we would be expecting roughly 3 standard deviations below the mean, so about 80 positive posts instead of the expected 100; the 100,000 users with the fewest negative posts would get about 35 instead of the expected 50.

This is much less variance than in the PNAS study, where they would have got, respectively, only 10 positive and only 5 negative, but it may have been enough to pick up a signal.

Apart from the calculations, which I probably got wrong anyway, what we have is that in the PNAS study they picked a subset of people and then they varied the distribution of posts, while in the second case you pick random posts for everybody and then you select the users with the most variance.

If you could arrange distributions so that the distributions of posts seen by each users are the same, would it really be correct to view one study as experimental and one as observational? If the PNAS study had filtered 20% instead of 90% of the positive/negative posts, would it have been ethical? Does it matter what is the intention when designing the randomized algorithm that selects posts? If Facebook were to introduce randomness in the scoring algorithm with the goal of later running observational studies would it be ethical? Would they need to let people opt out? I genuinely don’t know the answer to these questions, but I haven’t seen them discussed elsewhere.

3 thoughts on “The Facebook study, sampling, and the principle of delayed decision

  1. HI Luca,

    The scenario you describe where an “observational” study examines the natural variability of a uniformly random news feed algorithm is an excellent trick, and studies that identify and use such innate randomness are commonly called “natural experiments” [1].

    Matthew Salganik at Princeton recently advanced an interesting argument that natural experiments are where the science of “big data” holds most promise — recording all the data and being able to look back at natural experiments that have occurred: [2]

    Engaging with your more difficult questions of ethics, your example of a uniformly random news feed algorithm highlights to me an aspect of the discussion that I think is often overlooked — that design interventions in designed environments is very different from explicit interventions against a neutral baseline. There is no such thing as an objective ranking algorithm, cf. Scott Aaronson’s recent eigenmorality blog post [3]

    So: my feeling is that experimentation in domains that lack an objectively neutral baseline are a tricky ethical domain. I’d argue there’s more allowance for experimentation in such settings, as per the Common Rules’ definition of “minimal risk” to mean “that the probability and magnitude of harm or discomfort anticipated in the research are not greater in and of themselves than those ordinarily encountered in daily life…” [4], and the variability of designs establishes a expectation of variability in “daily life”. Or at least that’s where I stand in my current thinking on the matter.

    [1] http://en.wikipedia.org/wiki/Natural_experiment

    [2] https://msalganik.wordpress.com/tag/natural-experiments/

    [3] http://www.scottaaronson.com/blog/?p=1820

    [4] http://www.thefacultylounge.org/2014/06/how-an-irb-could-have-legitimately-approved-the-facebook-experimentand-why-that-may-be-a-good-thing.html)

    Best,
    Johan

  2. Pingback: big data drama saga: facebook (mis)steps in deep datamining doo-doo | Turing Machine

  3. Pingback: Friday links | Meta Rabbit

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s