Impagliazzo Hard-Core Sets via "Finitary Ergodic-Theory"

In the Impagliazzo hard-core set theorem we are a given a function g:\{ 0, 1 \}^n \rightarrow \{ 0,1\} such that every algorithm in a certain class makes errors at least a \delta fraction of the times when given a random input. We think of \delta as small, and so of g as exhibiting a weak form of average-case complexity. We want to find a large set H\subseteq \{ 0,1 \}^n such that g is average-case hard in a stronger sense when restricted to H. This stronger form of average-case complexity will be that no efficient algorithm can make noticeably fewer errors while computing g on H than a trivial algorithm that always outputs the same value regardless of the input. The formal statement of what we are trying to do (see also the discussion in this previous post) is:

Impagliazzo Hard-Core Set Theorem, “Constructive Version”
Let g:\{0,1\}^n \rightarrow \{0,1\} be a boolean function, s be a size parameter, \epsilon,\delta>0 be given. Then there is a size parameter s' = poly(1/\epsilon,1/\delta) \cdot s +  exp(poly(1/\epsilon,1/\delta)) such that the following happens.

Suppose that for every function f:\{0,1\}^n \rightarrow \{0,1\} computable by a circuit of size s' we have

Pr_{x \in \{0,1\}^n} [ f(x) = g(x) ] \leq 1-\delta

Then there is a set H such that: (i) H is recognizable by circuits of size \leq s'; (ii) |H| \geq \delta 2^n, and in fact the number of x in H such that g(x)=0 is at least \frac 12 \delta 2^n, and so is the number of x in H such that g(x)=1; and (iii) for every f computable by a circuit of size \leq s,

Pr_{x\in H} [ g(x) = f(x) ] \leq max \{ Pr_{x\in H}[ g(x) = 0] , Pr_{x\in H} [g(x)=1] \} + \epsilon

Our approach will be to look for a “regular partition” of \{0,1\}^n. We shall construct a partition P= (B_1,\ldots,B_m) of \{0,1\}^n such that: (i) given x, we can efficiently compute what is the block B_i that x belongs to; (ii) the number m of blocks does not depend on n; (iii) g restricted to most blocks B_i behaves like a random function of the same density. (By “density” of a function we mean the fraction of inputs on which the function evaluates to one.)

In particular, we will use the following form of (iii): for almost all the blocks B_i, no algorithm has advantage more than \epsilon over a constant predictor in computing g in B_i.

Let M_0 be the union of all majority-0 blocks (that is, of blocks B_i such that g takes the value 0 on a majority of elements of B_i) and let M_1 be the union of all majority-1 blocks.

I want to claim that no algorithm can do noticeably better on M_0 than the constant algorithm that always outputs 0. Indeed, we know that within (almost) all of the blocks that compose M_0 no algorithm can do noticeably better than the always-0 algorithm, so this must be true for a stronger reason for the union. The same is true for M_1, with reference to the constant algorithm that always outputs 1. Also, if the partition is efficiently computable, then(in a non-uniform setting) M_0 and M_1 are efficiently recognizable. It remains to argue that either M_0 or M_1 is large and not completely unbalanced.

Recalling that we are in a non-uniform setting (where by “algorithms” we mean “circuits”) and that the partition is efficiently computable, the following is a well defined efficient algorithm for attempting to compute g:

Algorithm. Local Majority
On input x:
determine the block B_i that x belongs to;
output 1 if Pr_{z\in B_i} [g(z)=1] \geq \frac 12;
otherwise output 0

(The majority values of g in the various blocks are just a set of m bits that can be hard-wired into the circuit.)

We assumed that every efficient algorithm must make at least a \delta fraction of errors. The set of \geq \delta 2^n inputs where the Local Majority algorithm makes mistakes is the union, over all blocks B_i, of the “minority inputs” of the block B_i. (If b is the majority value of g in a block B, then the “minority inputs” of B are the set of inputs x such that g(x) = 1-b.)

Let E_0 be the set of minority inputs (those where our algorithm makes a mistake) in M_0 and E_1 be the set of minority inputs in M_1. Then at least one of E_0 and E_1 must have size at least \frac {\delta}{2} 2^n, because the size of their union is at least \delta 2^n. If E_b has size at least \frac {\delta}{2} 2^n, then M_b has all the properties of the set H we are looking for.

It remains to construct the partition. We describe an iterative process to construct it. We begin with the trivial partition P = (B_1) where B_1 = \{ 0,1\}^n. At a generic step of the construction, we have a partition P = (B_1,\ldots,B_m), and we consider M_0, M_1,E_0,E_1 as above. Let b be such that E_b \geq \frac 12 \delta 2^n. If there is no algorithm that has noticeable advantage in computing g over M_b, we are done. Otherwise, if there is such an algorithm f, we refine the partition by splitting each block according to the values that f takes on the elements of the block.

After k steps of this process, the partition has the following form: there are k functions f_1,\ldots,f_k and each of the (at most) 2^k blocks of the partition corresponds to a bit string b_1,\ldots,b_k and it contains all inputs x such that f_1(x)=b_1,\ldots,f_k(x)=b_k. In particular, the partition is efficiently computable.

We need to argue that this process terminates with k=poly(1/\epsilon,1/\delta). To this end, we define a potential function that measures the “imbalance” of g inside the blocks the partition

\Psi(B_1,\ldots,B_m) := \sum_{i=1}^m \frac {|B_i|}{2^n} \left( Pr_{x\in B_i} [g(x) = 1] \right)^2

and we can show that this potential function increases by at least poly(\epsilon,\delta) at each step of the iteration. Since the potential function can be at most 1, the bound on the number of iterations follows.

A reader familiar with the proof of the Szemeredi Regularity Lemma will recognize the main ideas of iterative partitioning, of using a “counterexample” to the regularity property required of the final partition to do a refinement step, and of using a potential function argument to bound the number of refinement steps.

In which way can we see them as “finitary ergodic theoretic” techniques? As somebody who does not know anything about ergodic theory, I may not be in an ideal position to answer this question. But this kind of difficulty has not stopped me before, so I may attempt to answer this question in a future post.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s