Circuit Lower Bounds by Properties and by Brute Force

There are two families of approaches to proving circuit lower bounds: exploiting properties of functions computed by small circuits, and building hard functions by “brute force.” The first approach leads to “natural proofs” in the sense of Razborov and Rudich, which is a problem because natural proofs are unlikely to give lower bounds for circuit classes such as ${TC0}$ (constant-depth circuits with threshold gates), and maybe even ${ACC0}$ (constant-depth circuits with AND, OR, NOT, and MOD${_m}$ gates). The second approach creates proofs that “algebraize,” in the sense of Aaronson and Wigderson, and algebaizing proofs construct hard functions of very high uniform complexity; it is not even possible to use an algebraizing proof to show that the huge complexity class ${EXP^{NP}}$ contains problems of super-polynomial circuit complexity.

The ${NEXP \not\subseteq ACC0}$ circuit lower bound of Williams avoids both barriers, by exploiting (among other things) a property of ${ACC0}$ circuits (the existence of slightly faster-than-brute-force satisfiability algorithms) that is not a “natural” property, neither is an “algebraizing” property.

This post, written by a non-expert for non-experts (but experts are welcome to read it and to point out any mistakes), describes some of the context.

1. Lower Bounds via Properties of Small Circuits

One way to prove circuit lower bounds is to identify a property of functions computable by circuits of a certain type, and then to show that a given function does not have this property. For example one can show that Parity is not in ${AC0}$ by

1. showing (via the switching lemma) that every function computable by small ${AC0}$ circuits becomes computable by a small DNF when most variables are fixed to random values, and
2. if you fix ${n-k}$ of the ${n}$ input bits of a Parity problem to random values you are still left with a Parity over ${k}$ bits which requires a DNF of size ${2^{k-1}}$.

This approach usually leads to natural proofs of lower bounds, but natural proofs cannot be used to argue lower bounds even for ${TC0}$ (bounded depth circuits using threshold gates) and maybe not even ${ACC0}$.

So far, we have no example of a lower bound proved for a class of circuits such that

1. the proof uses a structural properties of the functions in the circuit class and
2. the proof is non-natural.

Indeed, with one exceptions that I describe below, we don’t even have conjectures that would lead to such lower bounds. The exception is a conjecture of Gil Kalai who suggested the following:

Conjecture 1 Let ${f: \{ 0,1 \}^n \rightarrow \{ 0,1 \}}$ be a monotone function computable by a ${TC0}$ circuit of size ${S}$ and depth ${d}$. Then

$\displaystyle \mathop{\mathbb P} [ f(x) \neq f(x+e) ] \leq .001 \ \ \ \ \ (1)$

where ${x\in \{ 0,1 \}^n}$ is uniformly distributed and ${e}$ is sampled by letting each bit independently be ${1}$ with probability ${1/(\log n)^{100d}}$ (the noise parameter) and ${0}$ with probability ${1-1/(\log n)^{100d}}$.

If the conjecture is true, then the recursive majority function ${RecMaj}$, which is computable in polynomial time, cannot be computed in ${TC0}$ because it satisfies

$\displaystyle \mathop{\mathbb P} [ RecMaj(x) \neq RecMaj(x+e) ] \geq .4$

even with the much smaller noise parameter ${1/n^{\Omega(1)}}$.

Note that such an approach would not give a natural proof because the property is defined only for monotone functions, which are a vanishingly small fraction of all functions. Also, and more interestingly, Gil points out that a random monotone function is close to a majority function, and so the right-hand-side in (1) for a random monotone ${f}$ is much bigger than ${.001}$.

In my opinion, this is the right type of property to look at. (Even if the specific conjecture above may well be false.) My guess for what a non-natural circuit lower bound proof might look like is that one would look at a parameter of boolean functions such that a specific hard function achieves a very high (or very low) extremal value, beating by a notable margin the parameter obtained by a random function; then one would show that functions of bounded circuit complexity cannot do as much better than a random function as the hard function. If such a parameter could be found, then the property would not satisfy the largeness requirement, and the argument would not give a natural proof.

2. Lower Bounds via Enumeration of Small Circuits

Another class of approaches to circuit lower bounds begins with the observation that a size-${S}$ circuit can be described using ${O(S\log S)}$ bits if the gates have bounded fan-in, and ${O(S^2)}$ bits otherwise. So if we are trying to construct a function that cannot be computed by circuits of size ${S}$ with bounded fan-in gates we can consider any set of ${> 2^{c \cdot S\log S}}$ distinct functions, for a sufficiently large constant ${c}$, for example the set of all functions that depend only on the first ${\log S + \log\log S + \log c}$ input bits, and we are guaranteed that such a set contains at least one function that cannot be computed by circuits of size ${S}$. For our lower bound, we can pick, for example, the lexicographically first function in the set with the property of not being computable by circuits of size ${\leq S}$.

How hard is it to uniformly compute such a function? A trivial brute-force search will take time ${2^{O(S \log S)}}$, and this is actually the best that we know how to do in terms of general circuit lower bounds for time-bounded classes. It is open, for example, to show that there is a problem solvable in deterministic time ${2^{O(n)}}$ that cannot be solved by circuits of size, say, ${\leq 100\cdot n}$.

Alternatively, we can find such a function within the third level of the polynomial hierarchy with complexity polynomial in ${S}$, that is

$\displaystyle NTIME(S^{O(1)})^{NP^{NP}} \not\subseteq SIZE(S(n))$

Is it possible to “speed up” the search for such a hard function?

Kannan introduced a simple but very effective idea: if we want to show that, in a given complexity class, there is a problem that is not decidable by circuits of size ${S(n)}$, we might as well start from the assumption that every problem in the class is decidable by circuits of size ${S(n)}$. This might seem strange, until one realizes that if ${A}$ is a mathematical statement, then ${A}$ is logically equivalent to ${\neg A \Rightarrow A}$. Kannan uses this trick to show that, for every fixed ${k}$, the complexity class ${NP^{NP}}$ contains a problem that is not in ${SIZE(O(n^k))}$. Starting from the assumption that every problem in ${NP^{NP}}$ is decidable by a circuit family of size ${O(n^k)}$ (in fact, the weaker assumption that ${NP}$ is decidable by polynomial size circuits is sufficient) we get, by a Theorem of Karp and Lipton, that ${NP^{NP^{NP}} = NP^{NP}}$, and so we can improve the complexity of our “hard-function-finding algorithm” from ${NP^{NP^{NP}}}$ to ${NP^{NP}}$. Scaling up the argument, it shows that there are problems in the complexity class ${NEXP^{NP}}$ that cannot be solved by polynomial size circuits.

A limitation of Kannan’s argument is that it relativizes, and it isn’t possible to substantially improve his result using relativizing techniques: there are oracles relative to which all problems in ${EXP^{NP}}$ can be solved by polynomial size circuits.

There are two reasons why Kannan’s argument relativizes:

1. it only uses properties of boolean circuits that remain true if, for an oracle ${A}$, we allow circuits with ${A}$-gates: the fact that there at at most ${2^{S^{O(1)}}}$ circuits of size ${\leq S}$, and the fact that a size ${S}$ circuit can be evaluated on a given input in uniform deterministic time ${S^{O(1)}}$ (provided of course that the uniform deterministic algorithm has access to the oracle ${A}$ as well);

2. it only uses properties of uniform computations that remain true relative to any oracle, such as the Karp-Lipton theorem.

It is very difficult to come up with an argument that is not-relativizing because of a violation of (1) above, that is, an argument that treats “rea-worldl” boolean circuits differently from boolean circuits with oracle gates. Such an argument would have to identify a special property that is true of functions computable by small circuits and, intuitively, we are back to the task of designing a non-natural proof.

We know, however, non-relativizing properties of complexity classes such as ${NP}$, ${EXP}$ and ${NEXP}$, so one could hope for a proof that is non-relativizing because of a violation of (2).

Buhrman, Fortnow and Thierauf achieve a very interesting proof-of-concept result for such an approach, by proving non-relativizing circuit lower bound results in the spirit of Kannan’s proof.

The non-relativizing result used by Burhman, Fortnow and Thierauf is a “Karp-Lipton-type” theorem of Babai, Fortnow, Nisan and Wigderson.

The non-relativizing starting point in the work of Babai et al. is the fact that ${EXP}$-complete problems have instance-checker. An instance-checker ${M}$ for a decision problem ${L}$ is a randomized algorithm that, given oracle access to an oracle ${O}$ and an input ${x}$ of length n, runs in time polynomial in ${n}$ and then outputs, with high probability, either ${L(x)}$ or ${ERROR}$; furthermore, if the oracle ${O}$ is ${L}$ itself, then the algorithm never outputs ${ERROR}$.

Now let’s start by assuming

$\displaystyle EXP \subseteq SIZE(n^{O(1)})$

with the hope of reaching a contradiction (although we will only prove something weaker.) We fix an ${EXP}$-complete decision problem ${L}$ and its instance checker ${M}$. We see that we have an ${MA}$ protocol for ${L}$ — an ${MA}$ protocol is an interactive protocol in which the prover sends one message to the verifier and then the verifier checks it in randomized polynomial time; it is basically the same as ${NP}$, but the verification of whether a witness is correct is allowed to run in probabilistic polynomial time instead of deterministic polynomial time.

In the ${MA}$ protocol for ${L}$, on a common input ${x}$ of length ${n}$ the prover, who wants to prove ${L(x)=1}$, simply sends the verifier a circuit ${C}$ that solves ${L}$ on all inputs of length at most ${n^c}$, where ${n^c}$ is an upper bound to the bit-length of the oracle queries made by ${M}$ on input ${x}$. Then the verifier runs ${M}$ on input ${x}$ and oracle ${C}$; if ${L(x)=1}$ and ${C}$ is valid, the verifier accepts with probability one, and if ${L(x)=0}$ there is very low probability that the verifier accepts, regardless of what circuit ${C}$ is sent by a cheating prover.

So we have the “Karp-Lipton-type” theorem of Babai, Fortnow, Nisan and Wigderson:

$\displaystyle EXP \subseteq SIZE(n^{O(1)}) \Rightarrow EXP = MA \ \ \ \ \ (2)$

The work of Burhman, Fortnow and Thierauf gives a “Kannan-type” theorem that uses (2) to “speed-up” the brute-force construction of a hard function. By scaling up (2) we get that if ${EXP \subseteq SIZE(n^{O(1)})}$ then

$\displaystyle EEXP = MAEXP$

where ${EEXP}$ is the class of problems solvable in double-exponential time, and ${MAEXP}$ is like ${MA}$ but the witness sent by the prover can be of exponential length. Clearly we have

$\displaystyle EEXP \not\subseteq SIZE(n^{O(1)})$

from the trivial argument given at the beginning of the section, and so

$\displaystyle MAEXP \not\subseteq SIZE(n^{O(1)})$

Overall, we have proved that

$\displaystyle EXP \subseteq SIZE(n^{O(1)}) \Rightarrow MAEXP \not\subseteq SIZE(n^{O(1)})$

which is enough to conclude, unconditionally, that

$\displaystyle MAEXP \not\subseteq SIZE(n^{O(1)})$

the interesting point being that there is an oracle relative to which ${MAEXP \subseteq SIZE(n^{O(1)})}$ and so we have a non-relativizing circuit lower bound.

Another “Karp-Lipton-type” theorem is given by Impagliazzo, Kabanets and Wigderson, who prove that

$\displaystyle NEXP \subseteq SIZE(n^{O(1)}) \Rightarrow NEXP = EXP \ \ \ \ \ (3)$

They show, in fact, a stronger conclusion, namely that for every problem in ${NEXP}$ and every yes-instance there is at least a certificate that can be “encoded” as a polynomial-size circuit. From the existence of such compactly-representable certificates it is easy to get a deterministic exponential time algorithm for every problem in ${NEXP}$ (do exhaustive search among all the compactly representable certificates, of which there are only singly exponentially many). Combining together all the results seen so far we get the “Karp-Lipton-type” theorem

$\displaystyle NEXP \subseteq SIZE(n^{O(1)}) \Rightarrow NEXP = MA \ \ \ \ \ (4)$

This means that if we could put MA in, say, ${NTIME (2^{O(n)})}$ we would have proved ${NEXP\not\subseteq SIZE(n^{O(1)})}$, because otherwise we would have ${NEXP \subseteq NTIME(2^{O(n)})}$, which is a contradiction to the non-deterministic time hierarchy theorem.

In particular, suppose that given a polynomial time randomized algorithm and a given an input we could distinguish, in deterministic time ${2^{O(n)}}$, between the case in which the acceptance probability is 1 and the case in which it is at most ${1/2}$. Then we could use such an algorithm to simulate ${MA}$ in non-deterministic time ${2^{O(n)}}$, and we would have ${NEXP\not\subseteq SIZE(n^{O(1)})}$.

So, in summary, a consequence of the work of Impagliazzo, Kabanets and Wigderson is that if we can derandomize “promise coRP” (the problem of distinguishing the case in which a given polynomial time randomized algorithm accepts a given input with probability 1 from the case in which it accepts with probability at most ${1/2}$) in time ${2^{O(n)}}$ then we would get the circuit lower bound ${NEXP\not\subseteq SIZE(n^{O(1)})}$.

I should note that, at the time (and even now), this result was seen as giving evidence of the difficulty of deriving sub-exponential derandomization results (it’s as hard as proving circuit lower bounds) and not as giving a promising approach to circuit lower bounds (it’s as easy as proving a sub-exponential derandomization).

It is also worth noting that, if we had such a derandomization result, then the resulting circuit lower bound would not come exactly from a “Kannan-type” argument applied to the “Karp-Lipton-type” result (4). That is, we would not use (4) in order to “speed up” the construction of a hard function, but instead we would use it to get a sped-up non-deterministic algorithm for ${NEXP}$, and we would reach a contradiction to the non-deterministic hierarchy theorem.

3. Ryan Williams’s Lower Bound

In STOC 2010, Ryan Williams proved that a much weaker derandomization result would suffice to imply ${NEXP \not\subseteq SIZE(n^{O(1)})}$. The requirement in the result of Impagliazzo, Kabanets and Wigderson is to have an algorithm that, for every ${\epsilon >0}$, given a circuit of size ${m}$ with ${n}$ inputs, runs in time ${2^{m^{\epsilon}}}$ and distinguishes the case in which the circuit accepts all inputs from the case in which it accepts less than half of the inputs. Ryan shows that it is sufficient to have a deterministic algorithm of running time ${2^{n} \cdot m /n^{\log n}}$, only marginally faster compared to the trivial ${2^n \cdot m}$. A simpler argument shows how to derive the circuit lower bound from the stronger assumption of having a deterministic algorithm of running time ${2^{n} \cdot m /n^{\log n}}$ for the circuit satisfiability problem (which is a harder problem than the derandomization problem).

Ryan’s unconditional proof that ${NEXP \not\subseteq ACC0}$ “scales down” the argument of the STOC 2010 paper, showing that a ${2^{n} \cdot m /n^{\log n}}$ algorithm for ${ACC0}$-satisfiability suffices to prove ${NEXP \not\subseteq ACC0}$, and then gives such an algorithm for ${ACC0}$-satisfiability.

In this line of work, this is the first paper that uses a non-relativizing property of circuits (rather than only non-relativizing properties of NEXP), namely the existence of a non-trivially fast satisfiability algorithm. The fact that the argument relies on a property of ${ACC0}$ circuits, however, does not make it a natural proof. I am not sure what’s the cleanest explanation for why this is, but, for starters, the property of having an efficient satisfiability algorithm is a property of the ${ACC0}$ circuits themselves, not a property of the truth-tables of the functions computable in ${ACC0}$

8 thoughts on “Circuit Lower Bounds by Properties and by Brute Force”

1. “This post, written by a non-expert for non-experts”

This is in my opinion one of the reasons that Luca Trevisan is such a popular person among theoreticians: his modesty.

I wish we had more people like him.

2. Dear Luca, Mike Saks also proposed at some point a property to distinguish TC0 functions, and his suggestion (that I dont remember) does not fail the “natural proofs” test. At that time Mike explained to me why “just” avoiding the natural proofs barrier is not enough since any argument that we can think about for proving such a property by induction of the circuit depth will not work for non-natural properties. I dont remember the details but I recall that this idea goes back, according to Mike, to the paper by Razborov and Rudish.

3. Great post.

I believe that Ryan’s method of upper bounds imply lower bounds is the way to eventually get even stronger results

4. Gil, the following is an argument why “properties analyzed by induction always give natural proofs,” I don’t know if it is the same argument that Saks gave to you.

Suppose you have a parameter $\mu(f)$ of boolean functions $f$ and that you have established that every circuit computing f has size at least $\Omega (\mu(f))$ by proving that $\mu(f \ OR \ g) \leq \mu (f) + \mu (g) + O(1)$ and $\mu(NOT \ f) \leq \mu(f) + O(1)$. Suppose also that for a very specific function $f_0$ you have been able to prove the lower bound $\mu (f_0) \geq s(n)$.

Now if the property “$\mu(g)$ is large” is true with noticeably large probability for a random function $g$ then you are in trouble because, if $\mu$ is exponential time computable, then you have a natural proof.

Unfortunately, it is always the case that $\mu(g)$ is large for a large fraction of random functions $g$. This is because, if I pick a random $g$, then I have

$\mu(f_0) \leq \mu(g) + \mu(g \oplus f_0) + O(1)$

because $f_0 = g \oplus (g \oplus f_0)$. Thus, either $\mu(g) \geq s(n)/2 - O(1)$ or $\mu (g \oplus f_0) \geq s(n)/2 - O(1)$, and so we have that a random function has probability at least 1/2 of having a value of $\mu$ at least $s(n)/2 – O(1)$.

A hypothetical parameter that measures “how much better than a random function” a given function is (in some properly chosen context) would not be “subadditive” in the sense above, and so it wouldn’t be subject to this barrier. It’s true, however, that another “barrier” comes up, namely that in order to prove the statement “if a function is computable by a small circuit then the parameter is not much larger than for a random function,” we cannot proceed by induction on the circuit.

P.S. actually all the above relates to “formula size” not “circuit size,” so it shows that even proving formula lower bounds by a property analyzed by induction has these problems.

5. Dear Luca, thanks. (I think it is similar to the argument I forgot.)

One thing that can be useful would be a statement of the form: A monotone Boolean function in X (say balanced) has a positive correlation
with some Boolean function in monotone X. (Maybe “monotone correlation” can even slowly go to 0.) Here X is AC0, ACC, TC0 etc.

This goes against pessimistic CC’s instincts but maybe it is not as false and hopeless as it looks.

6. The [IKW] assumption on the derandomization of promise-coRP, the way you have stated it above, is too strong. The same circuit lower bound follows if promise-coRP can be simulated in nondeterministic subexponential time.

Stated this way, [IKW] becomes incomparable to [Wil10].

Am I right?