The hard science of Sociology

On Monday, the first talk I managed to attend was by Terry Sejnovski, of the Salk Institute, and I only got there towards the end of the talk. Starting from the question of how the brain manages to “do” vision, he talked about fascinating experiments that, among other things, test the rationality of conscious versus unconscious decisions (apparently, in complex decision-making involving lots of variables, “gut feelings” are better than conscious reasoning) and measure the surprisingly good “computational” ability of monkeys.

In the afternoon, Valiant delivered his talk on “accidental algorithms” to a standing-room-only packed audience. His main results are (1) a $\oplus P$-completeness result for $\oplus$-2SAT, even when restricted to planar instances, and (2) a polynomial time algorithm for $mod_7$-2SAT on planar instances. $\oplus P$ is the complexity class of problems that reduce to the question of whether a given non-deterministic Turing machine, on a given input, has an odd number of accepting computations. By the Valiant-Vazirani result, a $\oplus P$-complete problem is also NP-hard under randomized reductions, and, by Toda’s theorem, is actually hard for all the levels of the polynomial hierarchy. The $\oplus$-2SAT problem is to determine whether a given 2SAT instance has an odd number of satisfying assignments. Apparently, this is the first natural $\oplus P$-complete problem for which there is a polynomial time algorithm that checks if the number of solutions is zero or non-zero. The $mod_7$-2SAT problem is, given a 2SAT formula, to determine if the number of assignment is or is not a multiple of 7. Valiant’s algorithm is based on his earlier work on matchgate computations.

Afterwards, most people moved to Uri Feige’s talk in the next room on finding witnesses of unsatisfiability for random 3CNF formulas. Previously, it was known how to find, with high probability, witnesses of unstatisfiability for a given random 3CNF formula with $n$ variables and about $n^{1.5}$ clauses. This new work by Feige, Kim and Ofek shows that witnesses of unsatisfiability exist for most random 3CNF formulas with $n$ variables and about $n^{1.4}$ clauses. (They do not give an efficient algorithm to find such witnesses.)

Nicholas Harvey gave the last talk of the day on his work on reducing the matching problem to matrix multiplication. Previous work had shown that matching can be solved in the same time as matrix multiplication, but the previous reduction was quite complex for general graphs. (And simpler for bipartite graphs.) Harvey gives a considerably more transparent new algorithm that reduces matching in general graphs to matrix multiplication. His talk was very clear and enjoyable, and I am grateful to Moses Charikar for recommending it. (Because of the word “matroid” in the title, I was going to skip to the talk.)

On Tuesday, Jon Kleinberg gave a spectacular invited talk on the kind of radically new results on sociology that one can extract from new datasets (resulting from people using large online retailers, joining online communities such as livejournal and myspace, and so on) and from algorithmic thinking. I did not take notes, and I could not find the slides online, so I can only offer a few snippets that I remember. In one of Jon’s most celebrated results, he studied an idealized model of “social networks” made of a mix of “local” connections and of random “long-distance” connections, and he showed that, in the resulting network, people would be able to find short paths among themselves only if the long-distance connections have a probability that is proportional the inverse of the square of the distance between two people. In the model, the local connections are laid out according to a grid, but the same result holds in a more general model where the local connections define a metric space, and the random connections are such that the probability that $i$ and $j$ are connected is proportional to the inverse of the number of people that are closer to $i$ than $j$ is. A study made on livejournal data shows that the “friends” that people make online are distributed in a way that is in spooky agreement with the above distribution.

Why do people make friends with precisely the only distribution that makes it easy to find short paths between themselves? The answer is unclear, but the outstanding thing is that a “law of human nature” seems to have been discovered, which could have never been formulated without thinking in terms of algorithms. Presumably, any explanation of this phenomenon will have to involve an algorithmic model of how people make connections online.

Another interesting question is the effect of “peer pressure” or community involvment. We are more likely to do something if our friends also do it (forgetting for a moment the question of what is the cause and what is the effect, that is whether we make friends with like-minded people, or whether we are influenced by what others do), but what exactly is the dependency between how many friends do something and how likely we are to do the same? Is there a “diminishing returns” curve, the way we are impressed by the first few friends who buy an ipod, and hence more and more likely to buy one ourselves, and then less and less impressed? Or is it a “critical mass” curve, the way skype is useless unless a lot of friends of ours are using it? There has been amusing work studying this question by looking at communities on livejournal and conferences on dblp. One can see how likely is a person to join a livejournal community as a function of the number of friends who are already members of the community, or see how likely is a person to publish for the first time in a conference as a function of former coauthors who have already published in the conference. The two curves are reasonably similar, and both are of “diminishing returns” type.

So one is more likely to, say, start publishing in a conference the more people one knows who are part of the community of that conference. Now, we can ask, given that one knows the same number of people in the community, does it make it more or less likely to start publishing there if the people know each other (say, have published together in the past), or if they do not know each other? To see the same question in other settings, if we hear a rumor, we are more likely to believe it if we hear it from more people. Furthermore, all other things being equal, we are more likely to believe it if we hear it from unrelated people, because it gives some kind of independent confirmation. If we are deciding whether to go to a party, we are also more likely to go the more people we know who are also going. In this case, however, all other things being equal, we are more likely to go if the people that we know who are going also know each other.

In the case of conferences, the former scenario is right. One is more likely to publish in a conference if the former coauthors who are part of the community are unrelated. Similarly, less “densely clustered” communities are the ones that are more likely to grow. There is a cause-and-effect problem here, in the sense that a community where everybody knows everybody else might be a mature and declining one, and being unappealing for such a reason; another explanation is that densely clustered community appear cliquish to outsiders, and are unappealing for such a reason.

There was more and more, in the talk. I have often heard that when Biology will become a “hard science” with a strong mathematical foundation, the mathematics will look familiar to those working in electrical engineering and in algorithms, and that theoretical computer science will be a major contributor to this future theoretical biology. What I did not know is that Sociology itself can become a hard science, and that theoretical computer science can play a major role in this transformation.

In theory, it was a good song

After the business meeting, we moved to the adjacent room for the concert of Lady X and the Positive Eigenvalues. We were not supposed to bring alcohol in the conference rooms. There was of course beer at the business meeting, but only because Satish had smuggled it in. But there is a hotel security guy in the hallway, and he should not see us walk out of one conference room and then into another with our beer. And so, following directions given by Satish, a group of distinguished theoreticians, plus me, with their unfinished drinks, takes a back door and tries to get to the next room via the kitchens. This is a worthy start for the rest of the night. (The detour is ultimately unsuccessful, because the back door of the other conference room is locked, so we have to wait for the security guy to leave.)

Lady X is the lead singer, and the band, the Positive Eigenvalues, includes Christos Papadimitriou, Mike Jordan, guest star Anna Karlin, and a number of other Berkeleites (for some reason, all Italians).

They play a number of covers of rock songs that everybody but me recognizes. The crowd of theoreticians sings along, gets warmer and warmer, and starts dancing. The program committee chair surveys the room standing on an eponymous.

Then they play Smells like teen spirit, that even I recognize, a theory pogo breaks out, and those pogoing next to James Lee fly off in all directions. I make my way to the opposite side of the stage.

They also have one original song, titled In Theory. It starts

Picture a carpenter of sorts
he has the flu, he has the blues
he burns his oil every night
all night frustrated and confused

and worst of all he can’t complain
in theory his job is plain

At the end, the crowd keeps asking for encores, and, by the time the concert ends, it is almost midnight.

(Pictures courtesy of Madhur Tulsiani.)

Off to a good start

FOCS 2006 is off to a good start with Salil’s talk on Sunday morning on realizing statistical Zero-Knowledge arguments for all of NP assuming one-way functions. A zero knowledge proof systems has two important requirements: the proof system must be sound, that is, a cheating prover cannot convince a verifier that a wrong statement is true, and it must be zero knowledge, meaning that a cheating verifier cannot obtain any information by interacting with the prover. Each requirement can come in a statistical version (in which the cheating party can be arbitrarily powerful) or a computational version (in which the requirement holds only for computationally bounded cheating parties).

If both are statistical, we have a Statistical Zero Knowledge proof system, which cannot be achieved for NP-complete problems unless NP is contained in coAM. (Which is about as unlikely as having NP=coNP.) If soundness is statistical but zero knowledge is computational, we have Computational Zero Knowledge proof systems, which exist for all of NP assuming one-way functions.

What if we want computational soundness and statistical zero knowledge? It had been known that one could get such systems for all of NP assuming one-way permutations. Now, with this work of Nguyen, Ong and Vadhan, we finally know that one-way functions suffice for this version as well.

Before lunch, Richard Karp talks about the theme of theoretical computer science as a lens for the sciences. This has been an angle pursued for increased theory funding at NSF, and it is a theme that will recur in all the invited talks. Karp focuses on the case of computational biology. Christos Papadimitriou chairs the session. He says that when he interviewed at MIT, Dertouzos asked him who was his role model, and Christos answered “Dick Karp,” an answer that Dertouzos seemed to like. When Karp starts speaking, he says “over time things have switched; now Christos is my role model, and this is why I am wearing black today.”

The most enjoyable talk of the day is given by Swastik Kopparty, on joint work with Eli Ben-Sasson and Jaykumar Radhakrishnan. They consider the question of the optimality of the Sudan and Guruswami-Sudan list-decoding algorithms for Reed-Solomon codes. Optimality, that is, in terms of how many errors (or, equivalently, how low agreement) the algorithm can tolerate. The combinatorial question is at which point the size of the list that the algorithm has to compute becomes super-polynomial, precluding any possibility of a polynomial-time algorithm. They vastly improve previous such bounds. In the 20 minutes alloted for the talk, Swastik manages, with contagious enthusiasm, to explain the problem, to fully prove a weaker version of the result, which already slightly improves previous bounds, and to give a good idea of how the general argument looks like.

Xi Chen gives the talk for his paper with Xiaotie Deng on Nash equilibria that won the best paper award. They show that, even for two players, the problem of computing a Nash equilibrium is PPAD-complete, where PPAD includes all search problems in NP where the existence of a solution is guaranteed by a certain “fixed-point argument.” In the longer time alloted for the paper award winner, Xi Chen explains the intuition that makes the two-player case seem easier than the case of three or more players, he describes previous results on the complexity of the cases of three or more players, and gives some idea of the new reduction. The talk is really excellent.

After dinner, the business meeting moves along unusually rapidly.

Sanjeev Arora thankfully does without the silly statistics on how many papers were accepted by number of authors and by length of the title, and simply offers the numbers of submissions, acceptance, and attendance. There are only about 220 registered attendants. (Plus a few unregistered ones as well, ehm, ehm.) This is about half the attendance at SODA. Isn’t there something wrong going on, suggests Sanjeev? If (now it’s me speaking, not Sanjeev) STOC and FOCS have, as a purpose, to be the place where the theory community comes together and we learn abotu results outside our area, but then the community does not come, haven’t we lost our purpose?

FOCS 2007 will be in Providence, in a hotel that is currently under renovation, and Alistair Sinclair will be the program committee chair. The bid to hold FOCS 2008 in Philadelphia wins by acclamation against no opponent, despite bidder Sanjeev Khanna pleading not to accept his bid.

Umesh suggests that we move to an online publication of the proceedings and forgo the paper version. After some discussion, we vote for a dizzying array of possibilities. “Would you like to stop publishing proceedings, distribute a CD at the conference, but not do the online publication suggested by Umesh, while continuing the online publication in the IEEE digital library” was one of the options put to vote.

So you are going to FOCS

FOCS in Berkeley starts in a few days. Here is some unsolicited advice.

The weather is going to be mild and sunny, but you are not going to wear shorts and sandals. And if you plan to go see the Golden Gate bridge, you want to carry some warm clothes with you.

There is no hotel in downtown Berkeley that is large enough to host FOCS, so the conference will be at a hotel by the bay which is somewhat far from the town center. There are places to walk to from the hotel, and there is public transportation, but it is still worth renting a car, because it will make dinner plans simpler.

Before driving to San Francisco, you can check traffic information on You can even see the traffic on the highway in real time via webcams. You can also go to San Francisco via the bart, a subway system. The public transportation system in San Francisco is called muni.

The San Francisco Bay Area is known for its food and its coffee shops; have fun trying a different place each night.

Check out to see what’s going on in the city during the weekend.

Among other events, this weekend will be the third weekend of Open Studios, an opportunity to visit the studios of local artists, which are not otherwise open to the public. And let’s not forget the unique opportunity to meet Crispin Glover in person!

San Francisco does not have the museum scene of New York or of a European capital, but it has a number of interesting destinations. Unique to San Francisco is the recently opened Asian Art Museum, the largest such in the West. The De Young museum in Golden Gate park (which is nowhere close to the Golden Gate bridge) reopened last year after a long renovation. The Palace of the Legion of Honor hosts a fine art museum which is all right, but its main attraction is the fantastic view of the bay that you get from the garden. The SF MOMA is hosted in a very beautiful contemporary building.

One of the best parties in town is Popscene, feauturing Brit-pop and “indie” music, but it’s on Thursdays. When it comes to San Francisco values, one of my favorite parties is Trannyshack, which is on Tuesdays. At other places, a drag show is often meant to be a man in a dress lip-synching to a Cher song. Nothing like this goes on at Trannyshack, a San Francisco institution that was the subject of a documentary. There, drag is an excuse for perfomances that can be highly conceptual, purely silly, outright offensive, or just bizarre depending on the performer. One night I was there the theme was punk rock, and there was this 6+ foot tall performer dressed like a punk girl singing a song I did not recognize, while the audience was having fun pogoing. At one point the performer tried to stage dive, but everybody just jumped out of his way (he was big), and he landed on the floor face first. This Tuesday night the theme will be vampires.

For more nightlife events, see or here.

The limitations of linear and semidefinite programming

The fact that Linear Programming (LP) can be solved in polynomial time (and, also, efficiently in practice) and that it has such a rich geometric theory and such remarkable expressive power makes LP a powerful unifying concept in the theory of algorithms. It “explains” the existence of polynomial time algorithms for problems such as Shortest Paths and Min Cut, and if one thinks of the combinatorial algorithms for such problems as algorithms for the corresponding linear programs, one gains additional insights.

When looking for algorithms for a new combinatorial problem, a possible approach is to express the problem as a 0/1 integer program, then relax it to a linear program, by letting variables range between 0 and 1, and then hope for the best. “The best” being the lucky event that the value of the optimum of the relaxation is the same as that of the combinatorial program, or at least a close approximation. If one finds that, instead, the relaxation has optimal fractional solutions of cost very different from the combinatorial optimum, one may try to add further inequalities that are valid for 0/1 solutions and that rule out the problematic fractional solutions.

Many “P=NP” papers follow this approach, usually by presenting a polynomial-size linear programming relaxation of TSP and then “proving” that the optimum of the relaxation is the same as the combinatorial optimum. One can find recent examples here and here.

Similar results were “proved” in a notorious series of paper by Ted Swart in the mid 1980s. After counterexamples were found, he would revise the paper adding more inequalities that would rule out the counterexample.

Finally, Mihalis Yannakakis took matters into his own hands and proved that all “symmetric” relaxations of TSP of sub-exponential size have counterexamples on which the optimum of the relaxation is different from the combinatorial optimum. (All of the LPs suggested by Swart where “symmetric” according to Yannakakis’s definition.)

This is actually one of the few known lower bounds that actually applies to a “model of computation” capturing a general (and otherwise promising) class of algorithms.

(I first read about this story in Christos Papadimitriou’s complexity book, but I found the above references in Gerhard J Woeginger’s P versus NP page.)

In the theory of approximation algorithms, we have plenty of problems that are believed to be intractable but that are not known to be NP-hard, such as approximating Vertex Cover within a factor of 1.9, or approximating Sparsest Cut within a factor of 10. LP and Semidefinite Programming (SDP) approaches are more or less the only promising tools we have to prove that such problems are tractable and, while we wait for NP-hardness result (for now, we only have “Unique-Games-hardness”), it is good to see whether certain candidate LP and SDP algorithms have any chance, or if they admit counterexamples showing large gaps between the optimum of the relaxation and the combinatorial optimum.

The problem with this approach is the nearly endless variety of relaxations that one can consider: what happens when we add triangle inequalities? and pentagonal inequalities? and so on. As in the case of Yannakakis’s result, it would be great to have a result that says “all SDP relaxations of Vertex Cover of type X fail to achieve an approximation ratio smaller than 2,” where “type X” is a general class of sub-exponential size SDP relaxations that include the type of inequalities that people use “in practice.”

Lovasz and Schrijver describe a method, denoted LS+, that starts from an LP relaxation of a problem (actually it can start from any convex relaxation), and then turns it into tighter and tighter SDP relaxations, by adding auxiliary variables and linear and semidefinite constraints. A weaker version of the method, denoted LS, only adds auxiliary variables and linear constraints.

A nice thing about the method is that, after you apply it to your initial relaxation, thus getting a tighter relaxation, you can then apply it again to the tighter one, thus getting an even better relaxation, and so on. Starting from an LP relaxation with n variables and poly(n) constraints, k applications of the method yield a relaxation solvable in n^{O(k)} time, which is polynomial for all fixed k and sub-exponential for k=o(n/log n). Lovasz and Schrijver prove that, after k applications (or “rounds”) the resulting relaxation enforces all inequalities over k-tuples of variables that are valid for 0/1 solutions. (In particular, one gets the combinatorial optimum after n rounds.) Typical approaches in the design of approximation algorithms are SDP with local inequalities (triangle inequalities etc.), and this is all captured after a few rounds of LS+.

It would be great to show that no constant (ideally, no sublinear) number of rounds of LS+ starting from the basic LP relaxation gives a 2-\epsilon approximation for vertex cover. Arora, and others Bollobas and Lovasz considered related questions in a FOCS 2002 paper that has inspired a considerable amount of later work. (See the introduction of the journal version.) Unfortunately the question remains open evern for two rounds of LS+. After one round, one gets an SDP relaxation equivalent to (number of vertices minus) the Lovasz Theta function, and Goemans and Kleinberg prove that such SDP does not achieve approximation better than 2. Beyond that, it is pretty much terra incognita. Charikar proves that a relaxation with triangle inequalities (which is incomparable with two rounds of LS+ and is weaker than three rounds) does not achieve approximation better than 2. Also, a sublinear number of rounds of LS+ does not achieve approximation better than 7/6. For LS, which, I remind you, generates linear programming relaxations rather than semidefinite programming ones, we know that no sublinear number of rounds leads to an approximation better than 2.

I will illustrate the main idea in the LS and LS+ method using the example of Vertex Cover. In the linear programming formulation, we have variables x_i, one for each vertex i, and the constraints that x_i + x_j \geq 1 for each edge (i,j) and that 0\leq x_i \leq 1. We would like to add constraints that are only satisfied by 0/1 solutions, and the constraint x_i^2 = x_i would work beautifully except that it is not linear (nor convex). Instead, Lovasz and Schrijver add new variables y_{i,j}, one for each pair of vertices, with the idea that we would like to have y_{i,j}=x_i * x_j; then they add the requirement y_{i,i} = x_i and various inequalities that make the y_{i,j} be “sort of like” x_i*x_j. In particular, we would like to require that if x_k \neq 0, then x_i = y_{i,k}/x_k. This is again a non-linear constraint, but, at least, we can check whether, for each fixed k, y_{i,k}/x_k is a fractional vertex cover: we just need to check the inequalities y_{i,k} + y_{j,k} \geq x_k for each edge (i,j)

Similarly, if x_k \neq 1, we expect to have x_i = (x_i-y_{i,k})/(1-x_k). We cannot check it directly, but we can check if
x_i - y_{i,k} + x_j - y_{j,k} \geq 1-x_k
hold for each edge (i,j). Finally, we obviously want y_{i,j} = y_{j,i}. This describe the LS method. For LS+, we add the requirement that the symmetric matrix (y_{i,j}) be positive semidefinite. Being positive semidefinite means that there are vectors b_1,\ldots,b_n such that
y_{i,j} = \langle b_i,b_j \rangle
where \langle \cdot , \cdot \rangle denotes inner product. If y_{i,j} = x_i *x_j then (y_{i,j}) is clearly positive semidefinite.

Lies, damn lies, and National Review

The Lancet has recently published a study of the number of deaths in Iraq caused by the invasion. From the abstract:

Data from 1,849 households that contained 12,801 individuals in 47 clusters was gathered. 1,474 births and 629 deaths were reported during the observation period. Pre-invasion mortality rates were 5.5 per 1000 people per year (95\% CI 4.3–7.1), compared with 13.3 per 1000 people per year (10.9–16.1) in the 40 months post-invasion. We estimate that as of July, 2006, there have been 654,965 (392,979–942,636) excess Iraqi deaths as a consequence of the war, which corresponds to 2.5\% of the population in the study area. Of post-invasion deaths, 601,027 (426,369–793,663) were due to violence, the most common cause being gunfire.

Author Mark Goldblatt notes in an article on National Review that, in these calculations, one should also consider the fact that, before the invasion, Iraq was subject to sanctions that were lifted after the American occupation. By some estimates, the sanctions were causing about 150,000 deaths a year. This means that, since 2003, about 450,000 deaths might have been avoided because of the end of the sanctions.

Considering that one would have expected 450,000 fewer deaths, and one gets instead 650,000 more, the conclusion would be that the extra deaths caused by the occupation would be in excess of a million. Of course, simply adding the two numbers is problematic for a few reasons: some of the effects of the sanctions (for example, on the health care system) may be similar to the effects of the occupation, and hence would be having similar (rather than nearly disjoint) effects. Most importantly, it has been alleged that the estimates on deaths caused by the sanctions were overstated.

Anyways, what is Goldblatt approach? He subtracts one number from the other! You know how this works: I owe you \$20 dollars, now lend me another \$30 and I will give you the \$10 difference tomorrow. If I may suggest an improvement to his methodology, he should also subtract the number of deaths that occurred in Switzerland over the same period of time. I am sure he would get even more accurate estimate.

Update: see also Tim Lambert at scienceblogs.

You too could be the subject of a film festival

San Francisco is the city of the many film festivals, but this month the programmers of the Castro Theater are having too much fun. This weekend, there was a first annual canine film festival.

At the end of the month, there will be the the first ever Crispin Glover film festival in the world, and, just in time for FOCS, you can catch Crispin Glover in person on October 20, 21 and 22.

And who is Crispin Glover, you may ask? Older readers may remember him from such movies as Back to the Future, where he played George McFly, the nerdy father of Michael J. Fox’s character.

Alan Turing tribute

The November issue of the Notices of the AMS is a tribute to our hero Alan Turing. The timing is odd because such things are typically done when there is a significant anniversary, which does not seem to be the case here. (70 years since the Turing machines paper?)

Unlike the case of the tributes to Grothendieck, not much is said about Turing’s life, except in Barry Cooper’s article.

Paper Dolls

Being back in San Francisco for a few days, I had a chance to catch Paper Doll, a documentary profiling a group of Filipino immigrants in Israel, working mostly as caregivers for the elderly.

The Philippines are, for some reason, great exporters of caregivers. Italy has a large such immigrant community, and so do many countries in East Asia. In Taiwan, for example, the exploitation of Filipino maids and caregivers made possible by immigration laws is a cause célèbre of leftist groups. The same legal problems arise in Israel, where a work visa is immediately voided if one is fired, resulting in illegal status and the possibility of deportation. Indeed, the same is true for software engineering on H1B visas in the US, but the difference in class, education, and type of employment (not to mention the possibility of permanent residency) does not quite create the same situation.

The main angle of the movie, however, is that the Filipinos profiled in the documentary are all transgender, and they have formed a group, called Paper Dolls, that performs drag shows at community events.

They are met with acceptance and prejudice in a way that is not always predictable. Their clients, including religious ones, are accepting (even though those working in ultra-orthodox neighborhoods are uncomfortable there). The relation between one of them and the elderly man that she cares for, in particular, is very touching. Their attempt to play their act at a big-name gay club in Tel-Aviv, however, ends in a disaster of cultural insensivity.

Eventually, the group disbands, partly because of the vagaries of the Israeli immigration laws, some of them going back to the Philippines, and some of them moving to London.

The movie does not quite have a point, and its own sensibility oscillates between exploitation and sympathy. If its point was to express this conflict, then it succeeds quite well.

The primes are random except when they are not

The next issue of the Bulletin of the AMS will have an article by Jaikumar Radhakrishnan and Madhu Sudan on Dinur’s proof of the PCP theorem. The article introduces the notion of PCP and its applications, as well as Irit’s proof.

There will also be an article on the work of Goldston-Pintz-Yildirim on small gaps between primes. Their main result is that for every eps>0 there are infinitely many pairs of consecutive primes whose difference is less than eps times the average distance between consecutive primes of that magnitude. That is, we can find infinitely many pairs of primes p,p’, p’ > p such that p’-p < eps log p. This is a step in the direction of proving that there are infinitely many pairs p,p’ such that p’-p =2, the twin primes conjecture.

This result is fascinating in various ways. One is the way it was discovered: Goldston and Yildirim had been working on gaps between primes since 1999, and announced the above result in 2003. But then a gap was found in the proof, which seemed to doom the whole approach. They kept working on it for two more years, however, until they made their breakthrough. You can read about it in Daniel Goldston’s amusing artcile my 30 minutes of fame.

Technically, their results improve on previous “sieve” techniques whose spirit is to show that the “primes are ‘pseudo-random’ except in the obvious ways in which they aren’t,” as Green and Tao more or less put it, a statement that is made precise in the Hardy-Littlewood conjecture. The idea is to start by thinking of the following probabilistic “model” for prime numbers: a number N is “prime” with probability 1/ln N, and to observe that certain properties of the primes (for example, the prime number theorem) hold in this model. Faced with a conjecture, one can check if it is true in the model, and then see if it is true for all sets of integers that have some “pseudorandomness” property and finally verify that the primes have such pseudorandomness property, usually via Fourier analysis. (This would be the “circle method” in number theory.)

The model, of course, has some major shortcomings. For example it predicts infinitely many even numbers to be primes, and infinitely many pairs of primes that differ only by one. We may correct the model by, more realistically, saying that if N>2 is even, then it is never prime, and if it is odd then it is prime with probability 2/ln N. This is already better, and it predicts some true asymptotic properties of the primes, but it still has problems. For example, it predicts inifinitely many triples of primes p,p+2,p+4, while 3,5,7 is the only such triple. (Why?) An even better model is that N>6 is prime with probability 3/ln N if N mod 6 is either 1 or 5, and N is never prime otherwise.

We see we can define a hierarchy of more and more accurate models: in general, for a fixed W, we can look at the distribution that picks a 1/ln N fraction of numbers from 1 to N and that only picks numbers N such that N and W share no common factor. As we pick W to be a product of more and more of the first few primes, we get a more and more accurate model. (This would be the issue of “major arcs” versus “minor arcs” in the circle method.)

Now look at a conjecture about primes, for example the twin primes conjecture, and see what these models predict. They all predict const*N/(ln N)2 twin primes between 1 and N, with the constant depending on W; for larger W, the constant tends to a limit ctwin. It is conjectured that there are in fact about ctwinN/(ln N)2 twin primes between 1 and N.

More or less, the Hardy-Littlewood conjecture implies that the prediction given by this model is always right for questions that involve “linear equations over primes.”

Apparently (I haven’t read the papers) Goldston-Pintz-Yildirim approach the pseudorandomness of primes by looking at “higher moments.” Green and Tao have been working on a different program based on studying the “Gowers uniformity” of the primes. (Technically, they study the Gowers uniformity of the Mobius function rather than of the characteristic function of the primes, but it’s close enough in spirit.)

In the paper Quadratic Uniformity of the Mobius Function they show that the primes have small dimension-3 Gowers uniformity, and, together with their earlier result on Gowers uniformity and polynomials, this is used in a new paper to prove that the predictions of the Hardy-Littlewood conjecture hold for every question about linear equations over primes of “bounded complexity.” For example, if you write certain pairs of linear equations over four prime unknowns, then the fraction of solutions will be as predicted by the limit of the random model. This includes the case of length-4 arithmetic progressions over primes, where you look at solutions p1,p2,p3,p4 to the equations p4-p3=p3-p2 and p3-p2=p2-p1.

These results and conjectures are part of an even bigger set of results whose spirit is that “multiplicative structure is pseudorandom with respect to addition,” that can be seen in various results that have applications to combinatorial constructions. This comes up most directly in sum-product results in finite fields and over the integers, used to construct extractors, in Weil’s result on character sums, which is used to construct $\epsilon$-biased generators, in certain expander constructions related to Ramanujan graphs, and so on.

It is hard to believe that, until recently, analytic number theory was considered an unfashionable area, what with its quantitative results and its lack of connections to algebraic geometry. For example, I understand that the Berkeley math department, which is quite large, has no analytic number theorist.