Suppose that {f: \{ 0,1 \}^n \rightarrow \{ 0,1 \}} is an arbitrary boolean function, and that we want to construct (or rather, show the existence of) a circuit {C} of size {\leq S} that approximates {f} as well as possible. How well can we hope to do?

Certainly, we can get a {O(1)}-size circuit {C} that computes {f} on {\geq \frac 12} of the inputs, by either choosing the circuit that always outputs 0 or the circuit that always outputs 1.

Also (ignoring {n^{O(1)}} factors), we can construct a circuit that has hardwired the values of {f} at {|S|} places, and then outputs always zero or always one (whichever is better) on the remaining inputs. This circuit will work on a

\displaystyle   \frac 12 + \frac {S}{2^n} \ \ \ \ \ (1)

fraction of inputs, and this seems like the best we can hope for.

We may then try to prove the optimality of the above construction by considering a random function {f}, using the Chernoff bound to show that, for a fixed circuit {C} of size {S}, {f} and {C} have agreement less than {1/2 + \epsilon} except with probability less than {e^{-\epsilon^2 2^n}}, and then taking a union bound over the roughly {2^S} circuits of size {S}. (It’s actually {2^{O(S \log S)}}, but we said we would ignore {n^{O(1)}}.) Unfortunately, this calculation only leads us to a proof that, in general, we cannot hope to compute a random function on more than a

\displaystyle  \frac 12 + \sqrt{\frac{S}{2^n}}  \ \ \ \ \ (2)

fraction of inputs using a circuit of size {S}. This is a disappointment because we were hoping to prove that (1) was tight.

If we think about it, however, we see that the bound in (2) is actually typically achievable for a random function: partition the {2^n} possible inputs into {S} blocks, each of size {2^n/S}. (Based, for example, on the value of the first {\log S} bits.) Now, the value of {f} in each block are just a sequence of {2^n/S} random bits, so, with constant probability, there is either an excess of {> \sqrt{2^n/S}} ones or an excess of {> \sqrt {2^n/S}} zeroes. For each block, hard-wire into the circuit the majority value of {f} among elements of the block. Clearly we are getting an {1/2 + \Omega(\sqrt{S/2^n})} fraction of inputs correct, and our circuit has size {S}.

What about general functions? Is it possible to do better than (1) for every function?

Yes. Andreev, Clementi and Rolim proved that for every {f: \{ 0,1 \}^n \rightarrow \{ 0,1 \}} and every {S< 2^n} we can construct a circuit of size {S} that computes {f} on at least a {\frac 12 + \sqrt{\frac S {2^n}}} fraction of inputs.

Their proof is perhaps a bit more complicated than it needed be, partly because they were trying to optimize the lower order terms that we shall ignore in this post.

Their first observation is that if {X_1,\ldots,X_N} are four-wise independent random bits, then with {\Omega(1)} probability there are either {\Omega(\sqrt N)} more ones than zeroes or {\Omega(\sqrt N)} more zeroes than ones. This fourth-moment calculation is the most complicated part of the proof. (Note that pairwise independence is not sufficient to have a constant probability of large deviation.)

Their second observation is that if {A \subseteq \{ 0,1 \}^m} is a set such that every four elements are linearly independent (thinking of {\{ 0,1 \}^m} as the vector space {{\mathbb F}_2^m}), and if we pick a random linear function {L: \{ 0,1 \}^m \rightarrow \{ 0,1 \}}, then the values of {L(x)} for {x\in A} are four-wise independent. They then proceed to show that there is an injective mapping {g: \{ 0,1 \}^n \rightarrow \{ 0,1 \}^m} such that {m = O(n)}, {g} has linear circuit complexity, and such that {g(\{ 0,1 \}^n)} has the property that every four elements are linearly independent. Hence, we get that, for a random linear {L}, the mapping {x \rightarrow L(g(x))} is a four-wise independent hash function. So let’s forget the specifics of {L} and {g}, and let’s just say we have sampled a hash function {h: \{ 0,1 \}^n \rightarrow \{ 0,1 \}} from a four-wise independent family.

Now divide {\{ 0,1 \}^n} into {S} blocks of {2^n/S} elements each, and consider how well {h} agrees with {f} on each block. If we focus on a particular block, then, over the randomness of {h}, we see that there is a constant probability that {h} either agrees with {f} on {> 1/2 + \Omega(\sqrt{S/2^n})} inputs or {< 1/2 - \Omega(\sqrt{S/2^n})} inputs.

Overall, there is an {h} such that the above large deviation happens for a constant fraction of the blocks. Now simply hardwire such an {h} in the circuit, together with a bit for each block saying whether {h} or the complement of {h} is the better approximation of {f} in that block. This is a circuit of size {S + n^{O(1)}} that computes {f} on a {> 1/2 + \Omega(\sqrt{S/2^n})} fraction of inputs.

Here is a simpler proof that avoids the fourth-moment calculation but which has slightly worse lower-order terms. For every function {f: \{ 0,1 \}^k \rightarrow \{ 0,1 \}}, there is a linear (to be precise, affine) function {L} that agrees with {f} on {> 1/2 + \sqrt{1/2^k}} fraction of inputs. This can be seen by a simple Fourier analysis calculation, or, in a proof that Anindya De showed me, by noting that any family of pairwise independent hash functions must contain a function with the above agreement (because a deviation at least as large as the standard deviation must be achieved with positive probability), and noting that the set of affine functions is such a family. Then, given {f: \{ 0,1 \}^n \rightarrow \{ 0,1 \}}, divide {\{ 0,1 \}^n} into {S} blocks with {2^n/S} elements each, based on the content of the first {\log S} bits; in each block, find an affine linear function that agrees with {f} restricted on the block on at least a {1/2 + \sqrt{S/2^n}} fraction of elements of the block. Hard-wire all the functions in a circuit of size {O(nS)}. (Each linear function has circuit complexity {O(n)}.)

About these ads