# What good is O(log n)-approximation?

I have spent a good part of the last two years catching up on recent work on PCP, hardness of approximation, and approximation algorithms. Lately, this has been a spectacularly successful area of theory, from ARV and its many spin-offs, to the fantastically deep work around the unique games conjecture, to Irit Dinur’s new approach to PCP, and so on. There is, however, a nagging question about this line of work. Given that, “in practice,” one wants (and, often, achieves) approximations within a factor of 1% or 2%, where is all the excitement about having approximation log n versus (log n)1/2, or even 2 versus 1.5? Isn’t this an entirely self-referential community that has nothing to offer to the practice of computing?

In his (beautifully written and highly recommended) book, Vijay Vazirani considers this question and gives the objective benchmark response.

In my own words, the point of the objective benchmark response is to observe that the service that theory can give to other communities is the development of ideas and techniques that are so counterintuitive and/or sophisticated that they would never occur to someone just working on a particular problem. Such ideas and techniques are the result of working in an idealized model (simplified problems, looking only for polynomial time algorithms, regardless of efficiency, focusing on worst-case approximation ratio) that, however far from reality, is objectively quantitative. If a new algorithm has a better approximation ratio than previous ones, then there must be something new in the algorithm, and, over time, whatever it is that is new will be understood, simplified, and distilled to the point of becoming a standard technique that can be used in practice. I assume that there are good stories to go with this argument. (Primal-dual algorithms? Graph-partitioning followed by dynamic programming?) Incidentally, this argument suggests that one should be more interested in an algorithm that barely matches known bounds but that works in a radically new way, than in an algorithm that shaves a loglog n factor by a new combination of known methods plus some case analysis.

I think, however, than one can give a slightly different response. Isn’t it true that in practice we want 2% approximation? Exactly!, I would say. But if we want to prove that our efficient algorithm achieves such approximation, we can’t, because PCP results forbid it. Plus, for specific algorithms, we know how to construct inputs on which the algorithm performs very badly. And there is now an intriguing convergence between the kind of inputs constructed as worst cases of specific algorithms (such as linear and semidefinite programs) and the kind of inputs constructed as gadgets for PCP constructions. So we have an increasingly clear idea of what inputs are hard. This can be an excellent starting point for reasoning about algorithms. Once we know that graphs with property P are very bad for known algorithms and come out of PCP constructions, we can think of algorithms that work on anti-P graphs, and the insights developed in designing algorithms for general graphs, in finding hard instances, and constructing PCPs, might help us understand how algorithms work on anti-P graphs. Those anti-P graphs, in turn, might include most of the inputs that come up in practice. (More later on this point.)

There is at least one story to go with this argument, and it’s Sanjeev Arora’s lack of success in proving a PCP hardness for TSP in the plane, that led him to try to understand what a PCP gadget for TSP should look like, that led him to conclude that no such gadget could be embedded in the plane, because of a property of geometric instances that was the starting point of his approximation scheme! (See Arora’s paper.)

Lately, several researches have been working simultaneously on inapproximability and on approximation algorithms, and I hope that we will see more positive consequences of the understanding gained from lower bounds.

Another standard objection to our way of studying algorithms is the emphasis on worst case, while “in practice” one is happy enough with algorithms that are good on “most” inputs. Here I have never been satisfied with the pessimism response that we don’t know what distribution of inputs will be typical in applications, so it is good (and more reliable) to focus on worst-case performance. The recent work on smoothed analysis is certainly a good indication that the theory community can say interesting and rigorous things about random inputs.

There is, however, a different line of work that I would like to see. For example, there is now an understanding that “typical” networks have a power-law degree distribution and there are random models for them which are considered adequate and predictive. Wouldn’t it be great to have approximation algorithms for routing and other problems analysed on those distributions? Especially if the analysis had the following modular structure: (1) we understand how worst-case instances look like for important network problem X; (2) by reasoning about networks that are very far from such worst-cases, we develop an algorithm for X that has outstanding performance on graphs with property Y; (3) graphs sampled from model Z of power-law graphs have property Y with very high probability.

Update 8/1/06: David Johnson points out that there is a body of recent work showing discrepancies between the topology of real networks and the topologies generated by popular models of power-law graphs. For example, the graphs in the probabilistic model have smaller vertex-connectivity than real networks (which are designed to be fault-tolerant). Here is one early paper about this.

# The Pros and Cons of Awards

The institution of the Best Paper award at STOC and FOCS is relatively recent (2002?) and it came with some controversy. Indeed, most people acknowledged that such an award (as opposed to the best student paper award, which has nearly universal support) has a series of shortcomings, such has the element of randomness in recognizing one (or a few) of the top papers in each year, and the difficulty of recognizing excellence without the benefit of hindsight. But, more fundamentally, our community likes to see itself as engaged in a group effort to advance knowledge, and it is often from unglamorous and unfashionable investigations that the next great advance comes from. A best paper award, like any award, however, inevitably rewards the individual over the group and the fashionable over the unfashionable.

All this is true, the proponents of the award (successfully) argued, but awards are useful as a medium of communication between us and other research communities. They tell people in department-wide or university-wide (or NSF-wide) committees “listen to what the theoretician guy is saying; the awards we have given him mean he is not a random guy off the street,” and they tell people in ad-hoc committees “give tenure to this theory candidate, the awards we have given her mean the theory community thinks highly of her,” and so on.

Oded Goldreich explains these points rather eloquently.

This month, the Notices of the AMS (the American Mathematical Society), has a discussion on a proposal to name some mathematicians Fellows of the AMS, the way other learned societies do, including the ACM. The piece includes an article in favor and an article against the proposal.

The same arguments I outlined above for/against best paper awards are given in these articles. There is, however, a noticeable addition. In arguing against the proposal, David Eisenbud says, almost in so many words, that many of those who are in favor of the proposal must be intellectually dishonest: surely they can see the many downsides of the proposal, and so their support must be motivated by their expectation of becoming fellows. This is really a new one, I don’t think I ever heard such an argument when discussing awards in TCS.

It should be noted, however, that the AMS proposal involves the making of a lot of fellows. About a thousand people will be shooed in to get things started (to give you a sense, I would be eligible to be a founding fellow) and then more than a hundred fellows would be added each year. To give some proportion, ACM has about 80,000 members, and it named 34 fellows in 2005. AMS has about 30,000 members.

# Average-case complexity

What do we do with an NP-complete (optimization) problem of practical relevance? We strongly believe that there is no polynomial time exact algorithm, but what are the next best alternatives? Garey and Johnson, in 1978, list various possible ways to “cope” with NP-completeness, including looking for approximate algorithms and for algorithms that are efficient on average. Of course, when a problem is really hard, even these options may be intractable, and it would be good to know when this is the case.

Before the development of PCP machinery, complexity theorists had almost no tool to study the complexity of approximations. When it comes to studying the average-case complexity of natural problems in NP, unfortunately, we are still at the stage of pre-PCP studies of approximation.

Indeed, it has been quite difficult even to lay out the ground rules of the study of average-case complexity. Even to define such basic notions as “computational problem,” “efficient algorithm,” “reduction,” and “complete problem” in the setting of average-case complexity leads to surprisingly subtle difficulties.

Much of this foundational work was laid out in a famously succint paper of Levin’s, with important extensions appearing in a paper of Ben-David and others and in a paper of Impagliazzo and Levin.

The main results and insights in this field as of 1995 are summarized in a paper by Russell Impagliazzo that is perhaps the best survey paper in theoretical computer science ever written.

Andrej Bogdanov and I have written a more recent survey on the subject. We are in the process of revising it for journal publication, and we will be very grateful to anybody pointing out errors and omissions (especially in the references).

So, given that we have no tools yet to study the average-case complexity of natural problems, what exactly do we talk about for 58 pages?

There are, indeed, a number of interesting results and insights.

1. To begin with, if we have a language L and a distribution D over inputs, how do we define the notion that an algorithm A is “efficient on average” for L with respect to D? The simplest definition is to require the expected running time to be polynomial. Indeed, all expected polynomial time algorithms satisfy the intuitive notion of being “efficient on average,” but there are algorithms of expected superpolynomial running time that are also intuitively “typically efficient.” Suppose for example that, for every n, our algorithm takes time n2 on all inputs of length n, except one exceptional input on which the algorithm takes time 22n. Levin’s definition better captures the intuitive notion and includes such “pathological” cases. In Levin’s definition, an algorithm is “polynomial on average” if, after running for t(n) steps, the algorithm can solve all but a poly(n)/(t(n))c fraction of inputs of length n, for some c>0.
2. Now that we have a notion of efficiency, the next step is to have the average-case analog of an NP-complete problem. Consider the “bounded halting” problem BH where, on input a pair (M,x) where M is a non-deterministic Turing machine and x is a string of length n, we ask whether M can accept x in at most, say, n2 steps. It’s clear that BH is NP-complete. If L is a language in NP, and M is the machine that decides L, then the reduction from L to BH essentially maps x into (M,x). (We have to add some padding if M takes more than quadratic time to solve L, but the details are trivial.)

Now let D be a distribution. Consider the reduction from L to BH that maps x into (M’,C(x)) where C is an encoding function and M’ is a machine that on input y first finds x such that C(x)=y and then simulates M on input x. As long as C is injective and polynomial time computable this is still a valid reduction. Levin shows that, if D comes from a certain class of “polynomial-time computable distributions,” then we can find a C such that C(x) is “almost uniformly” distributed when x is sampled from D. This means that the reduction maps a random input from D into a pair (M,C(x)) that is “almost uniformly” distributed. (Technically, the resulting distribution has very high min-entropy or, which is the same, is “dominated” by the uniform distribution.) This means that an algorithm that is good-on-average for BH under the uniform distribution yields an algorithm that is good-on-average for L under the distribution D. Recall that L was an arbitrary problem in NP and D was an arbitrary “polynomial time computable” distribution.

3. If there is a good-on-average algorithm for BH under the uniform distribution, then there is a good-on-average algorithm for the search version of BH as well. (In the search version, we want to find an accepting computation of M if one exists.) The equivalence is trivial in the worst-case setting, but it requires an ingenious proof due to Ben-David et al. in the average-case setting.
4. If there is a good-on-average algorithm for BH under the uniform distribution, then there is a good-on-average algorithm for every problem L in NP under any polynomial time samplable disribution. This is a result of Impagliazzo and Levin which is a proper generalization of the completeness result above. Samplable distributions are more general than computable distributions and capture essentially all “natural” distributions.
5. If there is a good-on-average algorithm for BH under the uniform distribution, can we say that there are worst-case efficient algorithms for all problems in NP? Not if the connection is made with a non-adaptive reduction. This is a result of Andrej and I, generalizing earlier work of Feigenbaum and Fortnow. Can we at least say that there are good-on-average algorithms for all L in NP and all distributions D? Such a conclusion implies worst-case efficient algorithms for NP, so the answer is again probably not.
6. So it is probably impossible to equate worst-case complexity and average-case complexity of NP-complete problems via reductions. What about different “degrees” of average-case complexity? If we define good-on-average algorithms to always run in polynomial time and to make a bounded number of mistakes, we can quantify average-case complexity by the fraction of inputs on which the best polynomial time algorithms makes mistakes. O’Donnell shows the equivalence of strong versus weak assumptions about the existence of average-case-hard problems in NP. Ryan’s proof brilliantly generalizes Yao’s XOR Lemma.

# The queens of terrorism

The Department of Defense runs a program called TALON (Threat and Local Observation Notice) to monitor terrorist threats agains the US military. It has recently been reported that student organizations involved in protests against the “don’t ask don’t tell” policy of the military have been under surveillance within this program. Now we can all feel safer in the knowledge that the grave threat posed by groups of gay college students is under control. We no more have to fear that they will show up at military recruiting events and taunt the recruiters about their bad haircuts until they would cry, which is about the most threatening scenario I can think of.

In somewhat related news, an Alabama petting zoo was among a list of critical targets for terrorism maintained by the Department of Homeland Security. Now we know why it was necessary to cut by 40% the counter-terrorism funds for Washington and New York, seeing how many critical targets there are in rural states.

It is, in fact, the state of Indiana that has the most terrorist targets, almost three times as many as California!

“We don’t find it embarrassing,” said the department’s deputy press secretary, Jarrod Agen. “The list [of targets] is a valuable tool.”

# E adesso aridatece la Gioconda

I make it to Rome just in time to see the game. In Frankfurt airport, a group of French that was on my plane from London starts singing the Marseillaise while walking to their connection. Sorry, guys.

On our way to the city center after the game, we stop at a newspaper kiosk on via Nomentana. Around midnight, barely an hour after the end of the game, and with the city completely gridlocked, they are selling the Monday edition of the Corriere dello Sport with a chronicle of the match. If only these miracles of logistics and of getting things done with tight deadlines could happen in areas unrelated to football!

The city is in a complete, beautiful chaos. (I’ll post some pictures.) About two hundred thousand people had watched the game at Circo Massimo, which is absolutely unreachable by car. There have been so many fireworks in Piazza del Popolo that there is smoke everywhere. Around 2am, throngs of people walk back along the Muro Torto to wherever they parked their cars, while an endless queue of cars and scooters tries to go in the opposite direction and reach the Lungotevere.

Sportsmanship, unfortunately, is not the strong point of the Roman football fan. The most popular choruses shouted in the street are not very classy. And what about the group of guys running around in tighty whities and with “W la fica” written on their chest?

At one point, from one car they start shouting “Forza Italia” (“go, Italy” which is, by the way, the name that Berlusconi chose for his party). From another car they shout back “don’t say ‘Forza Italia,” say ‘Forza Roma’.”

(The title of this post, and now give us back the Mona Lisa, is from a sign I saw on tv.)

Update: Roberto Calderoli, formerly a minister in Berlusconi’s government, commented:

una squadra che ha schierato lombardi, campani, veneti o calabresi, ha vinto contro una squadra che ha perso, immolando per il risultato la propria identità, schierando negri, islamici, comunisti

“A team composed of Lombardi, Campani, Veneti and Calabresi won against a team that has sacrificed its identity and played with negroes, muslims and communists.” Eat your heart out, Pat Robertson, we have much, much crazier guys than you in Italy, and they get appointed to the equivalent of cabinet positions.

Update: From YouTube, a video shot in Rome in Piazza Venezia.

# The truth is out there

I am in Bristol to attend a workshop on randomness and computation sponsored by the Hellbron Institute for Mathematical Research, which is supported by the GCHQ. What is it exactly about Markov chains, pseudorandomness, and PCP that they are interested in? In any case the workshop is great, except for the guys in dark suits that follow us everywhere.

After three days in England, here is the British English I have learnt:

lift -> elevator

mate -> dude

door tax -> cover charge

are you Polish? -> you have a funny accent

Italy rules the World Cup -> Italy rules the World Cup