Average-case complexity

What do we do with an NP-complete (optimization) problem of practical relevance? We strongly believe that there is no polynomial time exact algorithm, but what are the next best alternatives? Garey and Johnson, in 1978, list various possible ways to “cope” with NP-completeness, including looking for approximate algorithms and for algorithms that are efficient on average. Of course, when a problem is really hard, even these options may be intractable, and it would be good to know when this is the case.

Before the development of PCP machinery, complexity theorists had almost no tool to study the complexity of approximations. When it comes to studying the average-case complexity of natural problems in NP, unfortunately, we are still at the stage of pre-PCP studies of approximation.

Indeed, it has been quite difficult even to lay out the ground rules of the study of average-case complexity. Even to define such basic notions as “computational problem,” “efficient algorithm,” “reduction,” and “complete problem” in the setting of average-case complexity leads to surprisingly subtle difficulties.

Much of this foundational work was laid out in a famously succint paper of Levin’s, with important extensions appearing in a paper of Ben-David and others and in a paper of Impagliazzo and Levin.

The main results and insights in this field as of 1995 are summarized in a paper by Russell Impagliazzo that is perhaps the best survey paper in theoretical computer science ever written.

Andrej Bogdanov and I have written a more recent survey on the subject. We are in the process of revising it for journal publication, and we will be very grateful to anybody pointing out errors and omissions (especially in the references).

So, given that we have no tools yet to study the average-case complexity of natural problems, what exactly do we talk about for 58 pages?

There are, indeed, a number of interesting results and insights.

  1. To begin with, if we have a language L and a distribution D over inputs, how do we define the notion that an algorithm A is “efficient on average” for L with respect to D? The simplest definition is to require the expected running time to be polynomial. Indeed, all expected polynomial time algorithms satisfy the intuitive notion of being “efficient on average,” but there are algorithms of expected superpolynomial running time that are also intuitively “typically efficient.” Suppose for example that, for every n, our algorithm takes time n2 on all inputs of length n, except one exceptional input on which the algorithm takes time 22n. Levin’s definition better captures the intuitive notion and includes such “pathological” cases. In Levin’s definition, an algorithm is “polynomial on average” if, after running for t(n) steps, the algorithm can solve all but a poly(n)/(t(n))c fraction of inputs of length n, for some c>0.

  2. Now that we have a notion of efficiency, the next step is to have the average-case analog of an NP-complete problem. Consider the “bounded halting” problem BH where, on input a pair (M,x) where M is a non-deterministic Turing machine and x is a string of length n, we ask whether M can accept x in at most, say, n2 steps. It’s clear that BH is NP-complete. If L is a language in NP, and M is the machine that decides L, then the reduction from L to BH essentially maps x into (M,x). (We have to add some padding if M takes more than quadratic time to solve L, but the details are trivial.)

    Now let D be a distribution. Consider the reduction from L to BH that maps x into (M’,C(x)) where C is an encoding function and M’ is a machine that on input y first finds x such that C(x)=y and then simulates M on input x. As long as C is injective and polynomial time computable this is still a valid reduction. Levin shows that, if D comes from a certain class of “polynomial-time computable distributions,” then we can find a C such that C(x) is “almost uniformly” distributed when x is sampled from D. This means that the reduction maps a random input from D into a pair (M,C(x)) that is “almost uniformly” distributed. (Technically, the resulting distribution has very high min-entropy or, which is the same, is “dominated” by the uniform distribution.) This means that an algorithm that is good-on-average for BH under the uniform distribution yields an algorithm that is good-on-average for L under the distribution D. Recall that L was an arbitrary problem in NP and D was an arbitrary “polynomial time computable” distribution.

  3. If there is a good-on-average algorithm for BH under the uniform distribution, then there is a good-on-average algorithm for the search version of BH as well. (In the search version, we want to find an accepting computation of M if one exists.) The equivalence is trivial in the worst-case setting, but it requires an ingenious proof due to Ben-David et al. in the average-case setting.
  4. If there is a good-on-average algorithm for BH under the uniform distribution, then there is a good-on-average algorithm for every problem L in NP under any polynomial time samplable disribution. This is a result of Impagliazzo and Levin which is a proper generalization of the completeness result above. Samplable distributions are more general than computable distributions and capture essentially all “natural” distributions.
  5. If there is a good-on-average algorithm for BH under the uniform distribution, can we say that there are worst-case efficient algorithms for all problems in NP? Not if the connection is made with a non-adaptive reduction. This is a result of Andrej and I, generalizing earlier work of Feigenbaum and Fortnow. Can we at least say that there are good-on-average algorithms for all L in NP and all distributions D? Such a conclusion implies worst-case efficient algorithms for NP, so the answer is again probably not.
  6. So it is probably impossible to equate worst-case complexity and average-case complexity of NP-complete problems via reductions. What about different “degrees” of average-case complexity? If we define good-on-average algorithms to always run in polynomial time and to make a bounded number of mistakes, we can quantify average-case complexity by the fraction of inputs on which the best polynomial time algorithms makes mistakes. O’Donnell shows the equivalence of strong versus weak assumptions about the existence of average-case-hard problems in NP. Ryan’s proof brilliantly generalizes Yao’s XOR Lemma.
About these ads

5 thoughts on “Average-case complexity

  1. I would say that when you write about theory, you are crystal clear and a novice like me too get something out of your blog.

    Alas!! same cannot be said about non-theoretical blogs. Perhaps, I cannot capture the humour or whatever you are talking about.

  2. What methods if any exist to equate two questions in TCS, other than algorithmic reductions?

    And why aren’t you holding your breath for an adaptive reduction worst-case-to-average-case for NP?

    Thanks for working on and expositing this extremely interesting area in spite of its long refusal to yield the big secrets..

  3. What methods if any exist to equate two questions in TCS, other than algorithmic reductions?

    Suppose we want to show that “an efficient algorithm for problem B implies an efficient algorithm for problem A.” A “reduction” is an efficient algorithm for A that uses a B-oracle.

    There are some such implications that are not proved by reductions (as defined above). For example we know that a Sigma2-complete problem has a polynomial time algorithm iff 3SAT has a polynomial time algorithm. This is not proved by giving an algorithm that solves Sigma2-complete problems with a 3SAT oracle. (Such a thing would imply the collapse of the polynomial hierarchy.) Instead, we use the code of an assumed algorithm for 3SAT in devising the algorithm for the Sigma2-complete problem.

    The existence of such a “non-black-box” reduction (as they are called in cryptography) converting a good-on-average algorithm for BH into a worst-case efficient algorithm for 3SAT is not ruled out by the results of Feigenbaum, Fortnow, Andrej, and I.

    So can we use a good-an-average algorithm for BH, and its code in order to develop a worst-case efficient algorithm for 3SAT? Maybe and, in fact, I find this scenario more plausible than the possibility of a black-box non-adaptive reduction.

  4. …a Sigma2-complete problem has a polynomial time algorithm iff 3SAT has a polynomial time algorithm. This is not proved by giving an algorithm that solves Sigma2-complete problems with a 3SAT oracle. (Such a thing would imply the collapse of the polynomial hierarchy.)

    What result are you referring to here?

  5. If P=NP, then P=NP^NP, and, in general, P equals the polynomial hierarchy. So an algorithm for 3SAT implies algorithms for all of NP^NP. (And even all of PH.)

    But this is not a reduction in the standard sense: we don’t give an algorithm that can solve NP^NP problems given a 3SAT oracle. Such a reduction would imply P^NP=NP^NP, and the collapse of the polynomial hierarchy.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s