# Online Optimization Post 6: The Impagliazzo Hard-Core Set Lemma

(This is the sixth in a series of posts on online optimization techniques and their “applications” to complexity theory, combinatorics and pseudorandomness. The plan for this series of posts is to alternate one post explaining a result from the theory of online convex optimization and one post explaining an “application.” The first two posts were about the technique of multiplicative weight updates and its application to “derandomizing” probabilistic arguments based on combining a Chernoff bound and a union bound. The third and fourth post were about the Follow-the-Regularized-Leader framework, and how it unifies multiplicative weights and gradient descent, and a “gradient descent view” of the Frieze-Kannan Weak Regularity Lemma. The fifth post was about the constrained version of the Follow-the-Regularized-Leader framework, and today we shall see how to apply that to a proof of the Impagliazzo Hard-Core Lemma.)

# Impagliazzo Hard-Core Sets via "Finitary Ergodic-Theory"

In the Impagliazzo hard-core set theorem we are a given a function $g:\{ 0, 1 \}^n \rightarrow \{ 0,1\}$ such that every algorithm in a certain class makes errors at least a $\delta$ fraction of the times when given a random input. We think of $\delta$ as small, and so of $g$ as exhibiting a weak form of average-case complexity. We want to find a large set $H\subseteq \{ 0,1 \}^n$ such that $g$ is average-case hard in a stronger sense when restricted to $H$. This stronger form of average-case complexity will be that no efficient algorithm can make noticeably fewer errors while computing $g$ on $H$ than a trivial algorithm that always outputs the same value regardless of the input. The formal statement of what we are trying to do (see also the discussion in this previous post) is:

Impagliazzo Hard-Core Set Theorem, “Constructive Version”
Let $g:\{0,1\}^n \rightarrow \{0,1\}$ be a boolean function, $s$ be a size parameter, $\epsilon,\delta>0$ be given. Then there is a size parameter $s' = poly(1/\epsilon,1/\delta) \cdot s + exp(poly(1/\epsilon,1/\delta))$ such that the following happens.

Suppose that for every function $f:\{0,1\}^n \rightarrow \{0,1\}$ computable by a circuit of size $s'$ we have

$Pr_{x \in \{0,1\}^n} [ f(x) = g(x) ] \leq 1-\delta$

Then there is a set $H$ such that: (i) $H$ is recognizable by circuits of size $\leq s'$; (ii) $|H| \geq \delta 2^n$, and in fact the number of $x$ in $H$ such that $g(x)=0$ is at least $\frac 12 \delta 2^n$, and so is the number of $x$ in $H$ such that $g(x)=1$; and (iii) for every $f$ computable by a circuit of size $\leq s$,

$Pr_{x\in H} [ g(x) = f(x) ] \leq max \{ Pr_{x\in H}[ g(x) = 0] , Pr_{x\in H} [g(x)=1] \} + \epsilon$

Our approach will be to look for a “regular partition” of $\{0,1\}^n$. We shall construct a partition $P= (B_1,\ldots,B_m)$ of $\{0,1\}^n$ such that: (i) given $x$, we can efficiently compute what is the block $B_i$ that $x$ belongs to; (ii) the number $m$ of blocks does not depend on $n$; (iii) $g$ restricted to most blocks $B_i$ behaves like a random function of the same density. (By “density” of a function we mean the fraction of inputs on which the function evaluates to one.)

In particular, we will use the following form of (iii): for almost all the blocks $B_i$, no algorithm has advantage more than $\epsilon$ over a constant predictor in computing $g$ in $B_i$.

Let $M_0$ be the union of all majority-0 blocks (that is, of blocks $B_i$ such that $g$ takes the value 0 on a majority of elements of $B_i$) and let $M_1$ be the union of all majority-1 blocks.

I want to claim that no algorithm can do noticeably better on $M_0$ than the constant algorithm that always outputs 0. Indeed, we know that within (almost) all of the blocks that compose $M_0$ no algorithm can do noticeably better than the always-0 algorithm, so this must be true for a stronger reason for the union. The same is true for $M_1$, with reference to the constant algorithm that always outputs 1. Also, if the partition is efficiently computable, then(in a non-uniform setting) $M_0$ and $M_1$ are efficiently recognizable. It remains to argue that either $M_0$ or $M_1$ is large and not completely unbalanced.

Recalling that we are in a non-uniform setting (where by “algorithms” we mean “circuits”) and that the partition is efficiently computable, the following is a well defined efficient algorithm for attempting to compute $g$:

Algorithm. Local Majority
On input $x$:
determine the block $B_i$ that $x$ belongs to;
output $1$ if $Pr_{z\in B_i} [g(z)=1] \geq \frac 12$;
otherwise output 0

(The majority values of $g$ in the various blocks are just a set of $m$ bits that can be hard-wired into the circuit.)

We assumed that every efficient algorithm must make at least a $\delta$ fraction of errors. The set of $\geq \delta 2^n$ inputs where the Local Majority algorithm makes mistakes is the union, over all blocks $B_i$, of the “minority inputs” of the block $B_i$. (If $b$ is the majority value of $g$ in a block $B$, then the “minority inputs” of $B$ are the set of inputs $x$ such that $g(x) = 1-b$.)

Let $E_0$ be the set of minority inputs (those where our algorithm makes a mistake) in $M_0$ and $E_1$ be the set of minority inputs in $M_1$. Then at least one of $E_0$ and $E_1$ must have size at least $\frac {\delta}{2} 2^n$, because the size of their union is at least $\delta 2^n$. If $E_b$ has size at least $\frac {\delta}{2} 2^n$, then $M_b$ has all the properties of the set $H$ we are looking for.

It remains to construct the partition. We describe an iterative process to construct it. We begin with the trivial partition $P = (B_1)$ where $B_1 = \{ 0,1\}^n$. At a generic step of the construction, we have a partition $P = (B_1,\ldots,B_m)$, and we consider $M_0, M_1,E_0,E_1$ as above. Let $b$ be such that $E_b \geq \frac 12 \delta 2^n$. If there is no algorithm that has noticeable advantage in computing $g$ over $M_b$, we are done. Otherwise, if there is such an algorithm $f$, we refine the partition by splitting each block according to the values that $f$ takes on the elements of the block.

After $k$ steps of this process, the partition has the following form: there are $k$ functions $f_1,\ldots,f_k$ and each of the (at most) $2^k$ blocks of the partition corresponds to a bit string $b_1,\ldots,b_k$ and it contains all inputs $x$ such that $f_1(x)=b_1,\ldots,f_k(x)=b_k$. In particular, the partition is efficiently computable.

We need to argue that this process terminates with $k=poly(1/\epsilon,1/\delta)$. To this end, we define a potential function that measures the “imbalance” of $g$ inside the blocks the partition

$\Psi(B_1,\ldots,B_m) := \sum_{i=1}^m \frac {|B_i|}{2^n} \left( Pr_{x\in B_i} [g(x) = 1] \right)^2$

and we can show that this potential function increases by at least $poly(\epsilon,\delta)$ at each step of the iteration. Since the potential function can be at most 1, the bound on the number of iterations follows.

A reader familiar with the proof of the Szemeredi Regularity Lemma will recognize the main ideas of iterative partitioning, of using a “counterexample” to the regularity property required of the final partition to do a refinement step, and of using a potential function argument to bound the number of refinement steps.

In which way can we see them as “finitary ergodic theoretic” techniques? As somebody who does not know anything about ergodic theory, I may not be in an ideal position to answer this question. But this kind of difficulty has not stopped me before, so I may attempt to answer this question in a future post.

# The Impagliazzo Hard-Core-Set Theorem

The Impagliazzo hard-core set theorem is one of the bits of magic of complexity theory. Say you have a function $g:\{ 0, 1 \}^n \rightarrow \{ 0,1\}$ such that every efficient algorithm makes errors at least $1\%$ of the times when computing $g$ on a random input. (We’ll think of $g$ as exhibiting a weak form of average-case complexity.) Clearly, different algorithms will fail on a different $1\%$ of the inputs, and it seems that, intuitively, there should be functions for which no particular input is harder than any particular other input, per se. It’s just that whenever you try to come up with an algorithm, some set of mistakes, dependent on the algorithmic technique, will arise.

As a good example, think of the process of generating $g$ at random, by deciding for every input $x$ to set $g(x)=1$ with probability $99\%$ and $g(x)=0$ with probability $1\%$. (Make the choices independently for different inputs.) With very high probability, every efficient algorithm fails with probability at least about $1\%$, but, if we look at every efficiently recognizable large set $H$, we see that $g$ takes the value 1 on approximately $99\%$ of the elements of $H$, and so the trivial algorithm that always outputs 1 has a pretty good success probability.

Consider, however, the set $H$ of size $\frac {2}{100} 2^n$ that you get by taking the $\approx \frac{1}{100} 2^n$ inputs $x$ such that $g(x)=0$ plus a random sample of $\frac{1}{100} 2^n$ inputs $x$ such that $g(x)=1$. Then we can see that no efficient algorithm can compute $g$ on much better than $50\%$ of the inputs of $H$. This is the highest form of average-case complexity for a boolean function: on such a set $H$ no algorithm does much better in computing $g$ than an algorithm that makes a random guess.

The Impagliazzo hard-core theorem states that it is always possible to find such a set $H$ where the average-case hardness is “concentrated.” Specifically, it states that if every efficient algorithm fails to compute $g$ on a $\geq \delta$ fraction of inputs, then there is a set $H$ of size $\geq \delta 2^n$ such that every efficient algorithm fails to compute $g$ on at least a $\frac 12 - \epsilon$ fraction of the elements of $H$. This is true for every $\epsilon,\delta$, and if “efficient” is quantified as “circuits of size $s$” in the premise, then “efficient” is quantified as “circuits of size $poly(\epsilon,\delta) \cdot s$” in the conclusion.

The example of the biased random function given above implies that, if one wants to prove the theorem for arbitrary $g$, then the set $H$ cannot be efficiently computable itself. (The example does not forbid, however, that $H$ be efficiently computable given oracle access to $g$, or that a random element of $H$ be samplable given a sampler for the distribution $(x,g(x))$ for uniform $x$.)

A number of proofs of the hard core theorem are known, and connections have been found with the process of boosting in learning theory and with the construction and the decoding of certain error-correcting codes. Here is a precise statement.

Impagliazzo Hard-Core Set Theorem
Let $g:\{0,1\}^n \rightarrow \{0,1\}$ be a boolean function, $s$ be a size parameter, $\epsilon,\delta>0$ be given. Then there is a $c(\epsilon,\delta) = poly(1/\epsilon,1/\delta)$ such that the following happens.

Suppose that for every function $f:\{0,1\}^n \rightarrow \{0,1\}$ computable by a circuit of size $\leq c\cdot s$ we have

$\displaystyle Pr_{x \in \{0,1\}^n} [ f(x) = g(x) ] \leq 1-\delta$

Then there is a set $H$ of size $\geq \delta 2^n$ such that for every function $f$ computable by a circuit of size $\leq s$ we have

$\displaystyle Pr_{x\in H} [ f(x) = g(x) ] \leq \frac 12 + \epsilon$

Using the “finitary ergodic theoretic” approach of iterative partitioning, we (Omer Reingold, Madhur Tulsiani, Salil Vadhan and I) are able to prove the following variant.

Impagliazzo Hard-Core Set Theorem, “Constructive Version”
Let $g:\{0,1\}^n \rightarrow \{0,1\}$ be a boolean function, $s$ be a size parameter, $\epsilon,\delta>0$ be given. Then there is a $c(\epsilon,\delta) = exp(poly(1/\epsilon,1/\delta))$ such that the following happens.

Suppose that for every function $f:\{0,1\}^n \rightarrow \{0,1\}$ computable by a circuit of size $\leq c\cdot s$ we have

$\displaystyle Pr_{x \in \{0,1\}^n} [ f(x) = g(x) ] \leq 1-\delta$

Then there is a set $H$ such that: (i) $H$ is recognizable by circuits of size $\leq c\cdot s$; (ii) $|H| \geq \delta 2^n$, and in fact the number of $x$ in $H$ such that $g(x)=0$ is at least $\frac 12 \delta 2^n$, and so is the number of $x$ in $H$ such that $g(x)=1$; and (iii) for every $f$ computable by a circuit of size $\leq s$,

$\displaystyle Pr_{x\in H} [ g(x) = f(x) ] \leq max \{ Pr_{x\in H}[ g(x) = 0] , Pr_{x\in H} [g(x)=1] \} + \epsilon$

The difference is that $H$ is now an efficiently recognizable set (which is good), but we are not able to derive the same strong average-case complexity of $g$ in $H$ (which, as discussed as the beginning, is impossible in general). Instead of proving that a “random guess algorithm” is near-optimal on $H$, we prove that a “fixed answer algorithm” is near-optimal on $H$. That is, instead of saying that no algorithm can do better than a random guess, we say that no algorithm can do better than either always outputting 0 or always outputting 1. Note that this conclusion is meaningless if $g$ is, say, always equal to 1 on $H$, but in our construction we have that $g$ is not exceedingly biased on $H$, and if $\epsilon < \delta/2$, say, then the conclusion is quite non-trivial.

One can also find a set $H'$ with the same type of average-case complexity as in the original Impagliazzo result by putting into $H'$ a $\frac 12 \delta 2^n$ size sample of elements $x$ of $H$ such that $g(x)=0$ and an equal size sample of elements of $H$ such that $g$ equals 1. (Alternatively, put in $H'$ all the elements of $H$ on which $g$ achieves the minority value of $g$ in $H$, then add a random sample of as many elements achieving the majority value.) Then we recover the original statement except that $c(\epsilon,\delta)$ is exponential instead of polynomial. [Update: constructing $H'$ is somewhat more complicated than we originally thought, the details are in the paper.]

Coming up next, the proof of the “constructive hard core set theorem” and my attempt at explaining what the techniques have to do with “finitary ergodic theory.”