# On the Necessity of Enumerating All Programs

In a tutorial on average-case complexity that I gave at FOCS 2008, I mentioned four of my favorite open questions in the theory. I have some progress to report on one of the questions (#3 in this post), and on two related problems.

The question is whether a certain exponential loss is necessary in Levin’s proof that there are problems that are “NP-complete on average.” The answer is yes, it is necessary, unless the polynomial hierarchy collapses. The two related problems are whether related exponential losses in the analysis of Levin’s universal search algorithm and of Levin’s universal one-way function are also necessary. The answers are, respectively, yes unless ${P=NP}$, and yes relative to an oracle.

This is one of those now fashionable papers whose arguments are “obvious in retrospect.” (I would like to claim, however, that the proof of the result for Levin’s average-case completeness, while obvious in hindsight, is not so obvious without hindsight.) In all three cases, the exponential loss is in terms of the code length of certain programs. The idea in each case is to construct programs that contain in their code information about a specific instance of a problem of interest.

1. Levin’s Universal Search

It is simpler to start the discussion from Levin’s universal search algorithm. Levin, in his 1973 paper independently discovering NP-completeness, suggested the following procedure to solve NP search problems: look for solutions in order of increasing resource-bounded Kolmogorov complexity. In other words, try all possible algorithms, with a time-out, for all possible time-outs, with a rather smart schedule to choose algorithms and time-outs.

In the concrete case of 3SAT, Levin’s algorithm has the property that for every algorithm ${A}$ there exists a constant ${c_A}$ such that if ${A}$ finds a satisfying assignment for a formula ${\phi}$ in time ${t_A(\phi)}$, then Levin’s algorithm finds a satisfying assignment for ${\phi}$ in time at most

$\displaystyle c_A \cdot t_A(\phi) + |\phi|^{O(1)} \ \ \ \ \ (1)$

That is, for every algorithm ${A}$, the running time of Levin’s algorithm is at most a constant factor slower than ${A}$ on every input. The value of ${c_A}$, however, is exponential in the length of the code of ${A}$.

Could we hope to have a universal search algorithm for 3SAT that, for every algorithm ${A}$ and satisfiable ${\phi}$ as above has running time upper bounded by a polynomial in the length of ${A}$, in ${t_A(\phi)}$, and in ${|\phi|}$?

There is a very easy argument (which, as far as I know, has not been observed before) showing that such a “fully polynomial” universal search algorithm would imply P=NP.

Suppose such an algorithm existed, call it ${FPS}$; I claim that, for every satisfiable formula ${\phi}$, ${FPS}$ outputs a satisfying assignment for ${\phi}$ in time ${|\phi|^{O(1)}}$. (${FPS}$ may not terminate on unsatisfiable formulas, but the fact that it finds satisfying assignments, when they exist, in polynomial time suffices to conclude that ${P=NP}$.) To prove the claim, let ${\phi}$ be a satisfiable formula, and consider the algorithm ${A_\phi}$ that, on input a formula ${\psi}$, does the following: if ${\psi=\phi}$ then ${A_\phi (\psi)}$ outputs a satisfying assignment for ${\phi}$ that is explicitly specified as part of the code of ${A_\phi}$; if ${\psi\neq\phi}$ then ${A_\phi(\psi)}$ performs a brute-force search for a satisfying assignment for ${\psi}$, and it outputs the first one it finds, if any.

The length of the code of ${A_\phi}$ is linear in ${|\phi|}$, and the running time of ${A_\phi(\phi)}$ is also linear in ${|\phi|}$, and so ${FPS(\phi)}$ must output a satisfying assignment for ${\phi}$ in polynomial time.

Indeed, even this completely trivial proof is an overkill: we could have let ${A_\phi(\psi)}$ just output a satisfying assignment of ${\phi}$ regardless of the input ${\psi}$. The definition of ${A_\phi(\cdot)}$ given above, however, rules out even universal search algorithm whose performance is guaranteed only against algorithms that are provably correct on all inputs. Hutter has studied such algorithms, and proved that the exponential dependency on the length of the code of ${A}$ (and, in his case, exponential dependency on the length of the proof of correctness of ${A}$ and the proof establishing time bounds on ${A}$) can be pushed to an additive term, rather than being a factor that multiplies the running time of ${A}$, as in (1). The above construction, however, shows that the additive term cannot be polynomial in the code length and proof length.

There are two other famous results of Levin which involve exponential complexity in the length of the code of certain algorithms, and, in those cases, the necessity of the exponential dependency is a somewhat less trivial question.

2. Levin’s Universal One-Way Function

Quite related to his universal search algorithm is Levin’s proof that there exists a universal one-way function, that is, a polynomial time computable function that is one-way provided that one-way functions exist.

Levin’s idea is the following. Define the function ${u(\cdot,\cdot)}$ such that for every algorithm ${A}$ and input ${x}$ we have

$\displaystyle u(A,x) = (A,y)$

where ${y}$ is the output of ${A(x)}$ if the computation of ${A(x)}$ terminates within, say, ${|x|^2}$ steps, and ${y}$ is, say, a string of ${|x|}$ zeroes otherwise. We call such a function ${u}$ because it is essentially a (time-bounded) universal turing machine.

Now it is easy to see that if one-way functions exist, then there exists a one-way function computable in time at most ${|x|^2}$ on input ${x}$. Call this function ${f}$ and let ${A_f}$ be the algorithm that computes ${f(x)}$ in time ${< |x|^2}$. Then the mapping

$\displaystyle x \rightarrow u(A_f,x )$

is one-way (in fact, it is essentially the mapping ${x \rightarrow f(x)}$), and it accounts for a ${1/2^{|M_f|}}$ fraction of the inputs of ${u}$. This implies that ${u}$ is a weak one-way function.

Finally, define ${f_{Levin}}$ to be the function

$\displaystyle f_{Levin} (z_1,\ldots,z_m) := (u(z_1),\ldots,u(z_m))$

where an input of length ${n}$ has been parsed into ${m = \lfloor \sqrt n \rfloor}$ strings of length ${m}$. By Yao’s amplification of hardness for one-way functions, if ${u}$ is a weak one-way function then ${f_{Levin}}$ is a one-way function in the standard sense.

Where is the exponential dependency here?

This is a subtle but important point that I learnt from Charles Rackoff. Suppose that we have a polynomial time algorithm ${Inv_{Levin}}$ that inverts ${f_{Levin}}$ on an inverse polynomial fraction of inputs. Let ${g}$ be any polynomial time computable function. Then by using ${Inv_{Levin}}$ and the reduction outlined before we can invert ${g}$ on, say, half of the inputs, in time that is polynomial in the length of the input, but exponential in the length of the program of ${g}$.

Now, the strength of a universal one-way function is the fact that one can use it as a primitive and claim that either it is secure or otherwise all one-way functions can be inverted and thus no cryptography is possible. But if we assume that Levin’s function is not one-way, the inverter that we get for, say, AES, has complexity much higher than brute-force search. In general, if we have a finite function whose code length is comparable to its input length (as is the case of functions constructed using ${S}$-boxes and other ad-hoc tables of values) then no non-trivial attack for such a function would follow from breaking Levin’s function.

It would be much more desirable to construct a universal one-way function ${f_{U}}$ with the property that if there is a polynomial time algorithm that inverts ${f_U}$ on an inverse polynomial time fraction of inputs, then we can use such an inverter to invert any other function ${g}$ on an inverse polynomial fraction of inputs in time polynomial in the length of the input and on the length of the code of ${g}$.

We show that there is an oracle relative to which one-way functions exist but there is no algorithm that is able to invert every ${g}$ on an inverse polynomial time fraction of inputs and in time polynomial in the code length of ${g}$ and in the length of the given inputs.

3. Levin’s Average-Case Complete Problem

The exponential dependency in Levin’s average-case complexity is even harder to explain.

Perhaps it helps to start from Cook’s theorem that 3SAT is NP-complete, in the standard worst-case complexity, setting. Cook proves that for every language ${L}$ in ${NP}$ there is a polynomial time computable reduction function ${f_L}$ such that for every string ${x}$ we have

$\displaystyle x \in L \Leftrightarrow f_L(x) \in 3SAT \ \ \ \ \ (2)$

So if we have a worst-case polynomial time algorithm for solving 3SAT, we also get a worst-case polynomial time algorithm for ${L}$ which works as follows: on input ${x}$, apply the 3SAT algorithm for ${f_L(x)}$.

An important point is that such reductions ${f_L}$ not only exist, but are also explicitly and efficiently given if one has a description of ${L}$ in the form of a non-deterministic polynomial time algorithm that decides ${L}$. Indeed, Cook’s theorem shows the existence of a single function ${f(\cdot,\cdot,\cdot)}$ such that if ${M}$ is a non-deterministic algorithm that decides a language ${L}$ in time ${p(n)}$, then the mapping

$\displaystyle x \rightarrow f(M,x,p(|x|))$

is a reduction ${f_L}$ from ${L}$ to ${3SAT}$. Furthermore, ${f(M,x,t)}$ is computable in time polynomial in ${|M|}$, in ${|x|}$ and in ${t}$.

Note that, in order to establish the ${NP}$-completeness of 3SAT it would have been perfectly adequate if ${f(M,x,t)}$ had been computable in time exponential or doubly exponential in ${|M|}$, as long as it is polynomial in ${|x|}$ and ${t}$. This hypothetical result would, however, be much less satisfying, because in practice it would not be true that “an efficient algorithm for 3SAT implies an efficient algorithm for each ${L}$ in ${NP}$,” unless we restrict ourselves to languages ${L}$ that have extremely short descriptions.

In the theory of average-case complexity, instead of languages ${L}$ we study distributional problems, that is pairs ${(L,D)}$ where ${L}$ is a language and ${D}$ is a distribution over the inputs. A reduction from ${(L_1,D_1)}$ to ${(L_2,D_2)}$ is a polynomial time computable function ${f}$ such that ${x\in L_1}$ iff ${f(x) \in L_2}$, as in the worst-case complexity theory, but with the additional property that ${f(D_1)}$ is “polynomially dominated” by ${D_2}$. This means that if we look at any set of “bad” instances ${B}$ that has probability ${\beta}$ according to ${D_2}$, then if sample a random ${x}$ according to ${D_1}$, the probability of ${f(x) \in B}$ is at most ${poly(|x|) \cdot \delta}$.

The point of this definition is that we would like to say the following: if there is a good average-case algorithm ${A_2}$ for ${(L_2,D_2)}$, then there also is a good average-case algorithm for ${(L_1,D_1)}$, which works as follows: on input ${x}$, run algorithm ${A_2}$ on ${f(x)}$. The good-on-average algorithm ${A_2}$, however, might provide incorrect answers (or time out, depending on the flavor of definition) on some inputs, which have small probability mass according to ${D_2}$. The domination property ensures us that the distribution of ${f(x)}$, with ${x}$ sampled from ${D_1}$, is not too likely to land into this set of bad inputs for ${A_2}$.

Consider now the non-deterministic bounded-halting problem ${BH}$ defined as follows: on input ${(M,x,y)}$, where ${M}$ is a non-deterministic algorithm and ${x}$ and ${y}$ are bit strings, is there a computational path of ${M}$ that accepts ${x}$ in time at most ${|y|}$? Let ${U}$ be the uniform distribution (ensemble) over bit strings.

Here is a proof that the distributional problem ${(BH,U)}$ is average-case complete for all distributional problems of the form ${(L,U)}$, where ${L}$ is in ${NP}$: to reduce ${(L,U)}$ to ${(BH,U)}$, use the (randomized) reduction

$\displaystyle f_L(x) := (M_L, x, y )$

where ${M_L}$ is a non-deterministic Turing machine that decides ${L}$ in time ${p(n)}$ for some polynomial ${p}$, and ${y}$ is a randomly chosen string of length ${p(|x|)}$.

It is clearly true that, for all ${x}$ and all choices of ${y}$, ${x\in L \Leftrightarrow f_L(x) \in BH}$, and the domination property holds with a loss in probability which is about ${2^{|M_L|}}$.

For reasons that would take some time to elucidate, this loss in the domination factor is roughly equivalent to having a factor of ${2^{|M_L|}}$ in the running time of the reduction, and this is a rather undesirable feature of the reduction.

(Compare with the hypothetical discussion above on a proof of Cook’s theorem in which the reduction from a generic ${L}$ to ${SAT}$ takes time exponential in the description of the non-deterministic algorithm for ${L}$.)

Could we have a loss in probability which is polynomial in ${|M_L|}$ and in ${|x|}$? (We call a reduction with such parameters a “fully polynomial” reduction.)

Suppose that we had a worst-case to average-case reduction within NP, from, say, 3SAT to a distributional problem ${(L,U)}$. That is a randomized transformation that starting from a 3SAT instance ${\phi}$ produces a distribution ${f(\phi)}$ that is dominated by the uniform distribution and such that ${\phi \in 3SAT}$ if and only if ${f(\phi) \in L}$. Then we would also have a “fully polynomial” reduction proving that ${(L,U)}$ is complete on average for ${(NP,PSamp)}$. To reduce ${(L',D) \in (NP,PSamp)}$ to ${(L,U)}$ we would first reduce from ${L'}$ to ${3SAT}$ using Cook’s theorem, and then we would reduce from worst-case 3SAT to ${(L,U)}$ using the hypothetical worst-case to average-case reduction. We can check that the domination parameter would be polynomial in the length of the code of the non-deterministic algorithm defining ${L'}$.

The observation that I consider the main point of the paper is that the converse is true. If we had a completeness result in ${(NP,U)}$ via a fully polynomial reduction, then we would also have a worst-case to average-case reduction.

Here is the proof. Suppose a distributional problem ${(L,U)}$ were complete for ${(NP,U)}$ under fully polynomial reductions. We want to show that there is a worst-case to average-case reduction from 3SAT to ${(L,U)}$. Toward this goal, consider the language ${L_\phi}$ recognized by the non-deterministic algorithm ${M_\phi}$ that, on every input ${x}$, accepts if and only if ${\phi}$ is satisfiable, thus ${L_\phi}$ is either ${\{ 0,1 \}^*}$ or ${\emptyset}$ depending on whether ${\phi}$ is satisfiable or not. Note that the running time and length of ${M_\phi}$ are polynomial in ${|\phi|}$. We assumed that ${(L,U)}$ is complete for ${(NP,U)}$ via a fully polynomial reduction ${f(\cdot,\cdot,\cdot)}$, so the mapping

$\displaystyle x \rightarrow f(M_\phi, x, |\phi|^{O(1)} )$

is computable in time polynomial in ${|x|}$ and ${|\phi|}$ and, for a random ${x}$, the output is dominated by the uniform distribution with a domination parameter polynomial in ${|x|}$ and ${|\phi|}$. To finish the argument, notice that for every ${\phi}$ and ${x}$

$\displaystyle \phi \in 3SAT \Leftrightarrow x \in L_\phi \Leftrightarrow f(M_\phi,x,|\phi|^{O(1)} ) \in L$

and so the mapping that, given ${\phi}$, outputs ${f(M_\phi, x, |\phi|^{O(1)} ) }$ for a random ${x}$ is a worst-case to average-case reduction from ${3SAT}$ to ${(L,U)}$.

In past work with Andrej Bogdanov, we showed that such a worst-case to average-case reduction would imply that the polynomial hierarchy collapses. In fact, this is true even if the worst-case to average-case reduction is a “truth-table reduction.” (A polynomial time reduction that is allowed to make non-adaptive queries to the algorithm for the target problem.)

With the above argument, it follows that a fully polynomial completeness result proved with a truth-table reduction would also imply the collapse of the polynomial hierarchy.

Could there be a fully polynomial completeness result via a “Cook reduction”? (A polynomial time reduction that is allowed to make arbitrary, possibly adaptive, oracle queries to the algorithm for the target problem.) This is exactly the same question as whether there is an adaptive worst-case to average-case reduction within in NP, which remains open.

## 5 thoughts on “On the Necessity of Enumerating All Programs”

1. Thanks!

One question just to clarify my own understanding. In the first section, about “Levin’s Universal Search”, to make FPS terminate on unsatisfiable instances, can’t A_phi for unsatisfiable phi just terminate, outputting “unsatisfiable”, if the input psi is equal to phi, and do brute force search otherwise?

2. $A_\phi$ indeed terminates on every input.

This means that the P=NP consequence would follow even from the existence of a weaker type of FPS, which is allowed to not terminate on unsatisfiable formulas, but which is required to be competitive only against algorithms that terminate on every input.

Also, without loss of generality, one could make universal search algorithms terminate on all inputs by outputting “unsatisfiable” after about $2^n$ steps, because we know that if the formula is satisfiable then the universal search algorithm would output a satisfying assignment in $O(2^n)$ steps.

3. nice post!
Does anyone know if sth. is known about about “universal one-way permutations”? Seems like such a natural question, but Google has zero hits on that term. The best I can come up with is a universal PRP assuming NP=co-NP: define sth. like $\f(z_1,\ldots,z_m) = u(z_1)\ldots u(z_m)$ where $u(z)$ parses $z$ as $a,b,c$ and outputs $a,a(b),c$ if $c$ is a “witness” that $a$ is circuit that computes a permutation and $u(z)=z$ otherwise (not being a permutation is in NP, the witness is a pair of distinct $x,x’$ s.t. $a(x)=a(x’)$, thus not being a permutations is in co-NP).

4. Hi Krzysztof, that’s a very good question. Note that your proposal doesn’t work because your function equal the identity on all but an exponentially small fraction of inputs.

My guess is that there is an oracle relative to which a universal one-way permutation does not exist, according to the following definition:

A “universal one-way permutation” is a polynomial time computable permutation p and a reduction R such that for every machine M that computes a permutation and every function A that is the inverse of the permutation p on at least half of the inputs, we
have that $R^A(M, M(x))=x$ for at least half of the $x$‘s. The reduction $R$ is required to run in time polynomial in $|f(x)|$ but arbitrarily large in $|M|$.

5. Right, I missed the exponential loss, now I see why you have to argue about TMs not circuits, which brings up another question: can you construct a universal OWF in the non-uniform setting? Seems uniformity is a very crucial property here.

But using circuits is not the only problem with the construction for OWPs. You could hope that the following works: parse z,|z|=n as (a,b,c) where |a|=|c|=log(n) and output a(b) iff c is a witness that the TM a is a bijection on inputs of length n-2log(n) and running in time poly(n). The problem here is that there’s no reason such a short witness should exist, i.e. the length of the witness can in general be polynomial (not just logarithmic) in n.