Suppose that is an arbitrary boolean function, and that we want to construct (or rather, show the existence of) a circuit of size that approximates as well as possible. How well can we hope to do?
Certainly, we can get a -size circuit that computes on of the inputs, by either choosing the circuit that always outputs 0 or the circuit that always outputs 1.
Also (ignoring factors), we can construct a circuit that has hardwired the values of at places, and then outputs always zero or always one (whichever is better) on the remaining inputs. This circuit will work on a
fraction of inputs, and this seems like the best we can hope for.
We may then try to prove the optimality of the above construction by considering a random function , using the Chernoff bound to show that, for a fixed circuit of size , and have agreement less than except with probability less than , and then taking a union bound over the roughly circuits of size . (It’s actually , but we said we would ignore .) Unfortunately, this calculation only leads us to a proof that, in general, we cannot hope to compute a random function on more than a
fraction of inputs using a circuit of size . This is a disappointment because we were hoping to prove that (1) was tight.
If we think about it, however, we see that the bound in (2) is actually typically achievable for a random function: partition the possible inputs into blocks, each of size . (Based, for example, on the value of the first bits.) Now, the value of in each block are just a sequence of random bits, so, with constant probability, there is either an excess of ones or an excess of zeroes. For each block, hard-wire into the circuit the majority value of among elements of the block. Clearly we are getting an fraction of inputs correct, and our circuit has size .
What about general functions? Is it possible to do better than (1) for every function?
Yes. Andreev, Clementi and Rolim proved that for every and every we can construct a circuit of size that computes on at least a fraction of inputs.
Their proof is perhaps a bit more complicated than it needed be, partly because they were trying to optimize the lower order terms that we shall ignore in this post.
Their first observation is that if are four-wise independent random bits, then with probability there are either more ones than zeroes or more zeroes than ones. This fourth-moment calculation is the most complicated part of the proof. (Note that pairwise independence is not sufficient to have a constant probability of large deviation.)
Their second observation is that if is a set such that every four elements are linearly independent (thinking of as the vector space ), and if we pick a random linear function , then the values of for are four-wise independent. They then proceed to show that there is an injective mapping such that , has linear circuit complexity, and such that has the property that every four elements are linearly independent. Hence, we get that, for a random linear , the mapping is a four-wise independent hash function. So let’s forget the specifics of and , and let’s just say we have sampled a hash function from a four-wise independent family.
Now divide into blocks of elements each, and consider how well agrees with on each block. If we focus on a particular block, then, over the randomness of , we see that there is a constant probability that either agrees with on inputs or inputs.
Overall, there is an such that the above large deviation happens for a constant fraction of the blocks. Now simply hardwire such an in the circuit, together with a bit for each block saying whether or the complement of is the better approximation of in that block. This is a circuit of size that computes on a fraction of inputs.
Here is a simpler proof that avoids the fourth-moment calculation but which has slightly worse lower-order terms. For every function , there is a linear (to be precise, affine) function that agrees with on fraction of inputs. This can be seen by a simple Fourier analysis calculation, or, in a proof that Anindya De showed me, by noting that any family of pairwise independent hash functions must contain a function with the above agreement (because a deviation at least as large as the standard deviation must be achieved with positive probability), and noting that the set of affine functions is such a family. Then, given , divide into blocks with elements each, based on the content of the first bits; in each block, find an affine linear function that agrees with restricted on the block on at least a fraction of elements of the block. Hard-wire all the functions in a circuit of size . (Each linear function has circuit complexity .)