Online Optimization Post 6: The Impagliazzo Hard-Core Set Lemma

(This is the sixth in a series of posts on online optimization techniques and their “applications” to complexity theory, combinatorics and pseudorandomness. The plan for this series of posts is to alternate one post explaining a result from the theory of online convex optimization and one post explaining an “application.” The first two posts were about the technique of multiplicative weight updates and its application to “derandomizing” probabilistic arguments based on combining a Chernoff bound and a union bound. The third and fourth post were about the Follow-the-Regularized-Leader framework, and how it unifies multiplicative weights and gradient descent, and a “gradient descent view” of the Frieze-Kannan Weak Regularity Lemma. The fifth post was about the constrained version of the Follow-the-Regularized-Leader framework, and today we shall see how to apply that to a proof of the Impagliazzo Hard-Core Lemma.)

1. The Impagliazzo Hard-Core Lemma

The Impagliazzo Hard-Core Lemma is a striking result in the theory of average-case complexity. Roughly speaking, it says that if {g: \{ 0,1 \}^n \rightarrow \{ 0,1 \}} is a function that is “weakly” hard on average for a class {\cal F} of “efficiently computable” functions {f}, that is, if, for some {\delta>0}, we have that

\displaystyle \forall f \in {\cal F}: \ \ \Pr_{x\sim \{ 0,1\}^n} [f(x) = g(x) ] \leq 1 -\delta

then there is a subset {H\subseteq \{ 0,1 \}^n} of cardinality {\geq 2\delta 2^n} such that {g} is “strongly” hard-on-average on {H}, meaning that

\displaystyle \forall f \in {\cal F}: \ \ \Pr_{x\sim H} [f(x) = g(x) ] \leq \frac 12 + \epsilon

for a small {\epsilon >0}. Thus, the reason why functions from {\cal F} make a mistake in predicting {g} at least a {\delta} fraction of the times is that there is a “hard-core” set {H} of inputs such that every function from {\cal F} makes a mistake about 1/2 of the times for the {2\delta} fraction of inputs coming from {H}.

The result is actually not literally true as stated above, and it is useful to understand a counterexample, in order to motivate the correct statement. Suppose that {\cal F} contains just {1/\delta} functions, and that each function {f\in \cal F} differs from {g} in exactly a {\delta} fraction of inputs from {\{ 0,1 \}^n}, and that the set of mistakes are disjoint. Thus, for every set {H\subseteq \{ 0,1 \}^n}, no matter its size, there is a function {f\in \cal F} that agrees with {g} on at least a {1-\delta} fraction of inputs from {H}. The reason is that the sets of inputs on which the functions of {\cal F} differ from {g} form a partition of {\{ 0,1 \}^n}, and so their intersections with {H} form a partition of {H}. By an averaging argument, one of those intersections must then contain at most {\delta |H|} elements of {H}.

In the above example, however, if we choose any three distinct functions {f_1,f_2,f_3} from {\cal F}, we have

\displaystyle \forall x\in \{ 0,1 \}^n: \ \ \ g(x) = {\rm majority} (f_1(x), f_2(x),f_3(x))

So, although {g} is weakly hard on average with respect to {\cal F}, we have that {g} is not even worst-case hard for a slight extension of {\cal F} in which we allow functions obtained by simple compositions of a small number of functions of {\cal F}.

Theorem 1 (Impagliazzo Hard-Core Lemma) Let {\cal F} be a collection of functions {f: \{ 0,1 \}^n \rightarrow \{ 0,1 \}}, let {g: \{ 0,1 \}^n \rightarrow \{ 0,1 \}} a function, and let {\epsilon>0} and {\delta >0} be positive reals. Then at least one of the following conditions is true:

  • ({g} is not weakly hard-on-average over {\{ 0,1 \}^n} with respect to a slight extension of {\cal F}) There is a {k= O(\epsilon^{-2} \log \delta^{-1} )}, an integer {b}, and {k} functions {f_1,\ldots,f_k \in \cal F}, such that

    \displaystyle h(x) := I \{ f_1(x) + \ldots + f_k(x)\geq b \}

    satisfies

    \displaystyle \Pr_{x\in \{ 0,1 \}^n} [ g(x) = h(x) ] \geq 1-\delta

  • ({g} is strongly hard-on-average over a set {H} of density {2\delta}) There is a set {H\subseteq \{ 0,1 \}^n} such that {H \geq 2\delta \cdot 2^n} and

    \displaystyle \forall f\in {\cal F}: \ \ \Pr_{x\in H} [ g(x) = f(x) ] \leq \frac 12 + \epsilon

Where {I \{ {\rm boolean\ expression} \}} is equal to {1} or {0} depending on whether the boolean expression is true or false (the letter “{I}” stands for “indicator” function of the truth of the expression).

2. Proving the Lemma

Impagliazzo’s proof had {k} polynomial in both {1/\epsilon} and {1/\delta}, and an alternative proof discovered by Nisan has a stronger bound on {k} of the order of {\epsilon^{-2} \log \epsilon^{-1} \delta^{-1}}. The proofs of Impagliazzo and Nisan did not immediately give a set of size {2\delta2^n} (the set had size {\delta 2^n}), although this could be achieved by iterating their argument. An idea of Holenstein allows to prove the above statement in a more direct way.

Today we will see how to obtain the Impagliazzo Hard-Core Lemma from online optimization, as done by Barak, Hardt and Kale. Their proof achieves all the parameters claimed above, once combined with Holenstein’s ideas.

We say that a distribution {M} (here “{M}” stands for probability measure; we use this letter since we have already used {D} last time to denote the Bregman divergence) has min-entropy at least {K} if, for every {x}, {M(x) \leq 2^{-K}}. In other words, the min-entropy of a distribution {M} over a sample space {X} is defined as

\displaystyle H_{\infty} (M) := \min_{x\in X} \log_2 \frac 1 {M(x)}

The uniform distribution over a set {H} has min-entropy {\log_2 |H|}, and all distributions of min-entropy {K} can be realized as a convex combination of distributions that are each uniform over a set of size {\geq K}, thus uniform distributions over large sets and large-min-entropy distributions are closely-related concepts. We will prove the following version of the hard-core lemma:

Theorem 2 (Impagliazzo Hard-Core Lemma — Min-Entropy Version) Let {X} be a finite set, {\cal F} be a collection of functions {f: X \rightarrow \{ 0,1 \}}, let {g: X \rightarrow \{ 0,1 \}} a function, and let {\epsilon>0} and {\delta >0} be positive reals. Then at least one of the following conditions is true:

  • ({g} is not weakly hard-on-average over {X} with respect to {\cal F}) There is a {k= O(\epsilon^{-2} \log \delta^{-1} )}, an integer {b}, and {k} functions {f_1,\ldots,f_k \in \cal F}, such that

    \displaystyle h(x) := I \{ f_1(x) + \ldots + f_k(x)\geq b \}

    satisfies

    \displaystyle \Pr_{x\in X} [ g(x) = h(x) ] \geq 1-\delta

  • ({g} is strongly hard-on-average on a distribution of min-entropy {\geq \log_2 2\delta |X|}) There is a distribution {H} of min-entropy {\geq \log_2 2\delta|X|} such that

    \displaystyle \forall f\in {\cal F}: \ \ \Pr_{x\sim H} [ g(x) = f(x) ] \leq \frac 12 + \epsilon

Under minimal assumptions on {\cal F} (that it contains {<< 2^{|X|}} functions), the min-entropy version implies the set version, and the min-entropy version can be used as-is to derive most of the interesting consequences of the set version.

Let us restate it one more time.

Theorem 3 (Impagliazzo Hard-Core Lemma — Min-Entropy Version) Let {X} be a finite set, {\cal F} be a collection of functions {f: X \rightarrow \{ 0,1 \}}, let {g: X \rightarrow \{ 0,1 \}} a function, and let {\epsilon>0} and {\delta >0} be positive reals. Suppose that for every distribution {H} of min-entropy {\geq \log_2 2\delta|X|} we have

\displaystyle \forall f\in {\cal F}: \ \ \Pr_{x\sim H} [ g(x) = f(x) ] > \frac 12 + \epsilon

Then there is a {k= O(\epsilon^{-2} \log \delta^{-1} )}, an integer {b}, and {k} functions {f_1,\ldots,f_k \in \cal F}, such that

\displaystyle h(x) := I \{ f_1(x) + \ldots + f_k(x)\geq b \}

satisfies

\displaystyle \Pr_{x\in X} [ g(x) = h(x) ] \geq 1-\delta

As in previous posts, we are going to think about a game between a “builder” that works toward the construction of {h} and an “inspector” that looks for defects in the construction. More specifically, at every round {i = 1,\ldots,T}, the inspector is going to pick a distribution {M_i} of min-entropy {\geq \log_2 2\delta|X|} and the builder is going to pick a function {f_i\in {\cal F}}. The loss function, that the inspector wants to minimize, is

\displaystyle L_i (M) := \mathop{\mathbb P}_{x\sim M} [f_i (x) = g(x)]

The inspector runs the agile online mirror descent algorithm with the constraint of picking distributions of the required min-entropy, and using the entropy regularizer; the builder always chooses a function {f_i} such that that

\displaystyle L_i (M) := \mathop{\mathbb P}_{x\sim M} [f_i (x) = g(x)] > \frac 12 + \epsilon

which is always a possible choice given the assumptions of our theorem.

Just by plugging the above setting into the analysis from the previous post, we get that if we play this online game for {T = O(\epsilon^{-2} \log \delta^{-1})} steps, the builder picks functions {f_1,\ldots,f_T} such that, for every distribution {H} of min-entropy {\geq \log 2\delta |X|}, we have

\displaystyle \Pr_{x\sim H, \ i \sim \{ 1,\ldots,T \} } [ f_i(x) = g(x) ] > \frac 12 \ \ \ \ \ (1)

We will prove that (1) holds in the next section, but we emphasize again that it is just a matter of mechanically using the analysis from the previous post. Impagliazzo’s proof relies, basically, on playing the game using lazy mirror descent with {\ell_2^2} regularization, and he obtains a guarantee like the one above after {T=O(\epsilon^{-2} \delta^{-1})} steps.

What do we do with (1)? Impagliazzo’s original reasoning was to define

\displaystyle h(x) := majority (f_1(x),\ldots,f_T(x))

and to consider the set {B} of “bad” inputs {x} such that {h(x) \neq g(x)}. We have

\displaystyle \forall x \in B \ \ \Pr_{i \sim \{ 1,\ldots,T \} } [ f_i(x) = g(x) ] \leq \frac 12

and so

\displaystyle \Pr_{x\sim B, \ i \sim \{ 1,\ldots,T \} } [ f_i(x) = g(x) ] \leq \frac 12

The min-entropy of the uniform distribution over {B} is {\log_2 |B|}, and this needs to be less than {\log_2 2\delta |X|}, so we conclude that {h(x) \neq g(x)} happens for at most a {2\delta} fraction of elements of {X}.

This is qualitatively what we promised, but it is off by a factor of 2 from what we stated above. The factor of 2 comes from a subsequent idea of Holenstein. In Holenstein’s analysis, we sort elements of {X} according to

\displaystyle \Pr_{i \sim \{ 1,\ldots, T \}} [ f_i(x) = g_i (x) ]

and he lets {B} be the set of {2\delta |X|} elements of {X} for which the above quantity is smallest, and he shows that if we properly pick an integer {b} and define

\displaystyle h(x) := I\{ f_1(x) + \cdots + f_T(x) \geq b \}

then {h(x)} will be equal to {g(x)} for all {x\not\in B} and also for at least half the {x\in B}, meaning that {h(x) = g(x)} for at least a {1-\delta} fraction of the input. Since this is a bit outside the scope of this series of posts, we will not give an exposition of Holenstein’s argument.

3. Analysis of the Online Game

It remains to show that we can achieve (1) with {T} of the order of {\frac 1 {\epsilon^2} \log \frac 1 {\delta}}. As we said, we play a game in which, at every step {i=1,\ldots,T}

  • The “inspector” player picks a distribution {M_i} of min-entropy at least {\log_2 2\delta |X|}, that is, it picks a number {\frac 1 {2\delta |X|} \geq M_i(x)\geq 0} for each {x\in X} such that {\sum_x M_i(x) = 1}.
  • The “builder” player picks a function {f_i \in {\cal F}}, whose existence is guaranteed by the assumption of the theorem, such that

    \displaystyle \Pr_{x\sim M_i} [f_i(x) = g(x) ] \geq \frac 12 +\epsilon

    and defines the loss function

    \displaystyle L_i(M) := \Pr_{x\sim M_i} [f_i(x) = g(x) ] = \sum_{x\in X} M(x) \cdot I\{ f_i(x) = g(x) \}

  • The “inspector” is charged the loss {L_i(M_i)}.

We analyze what happens if the inspector plays the strategy defined by agile mirror descent with negative entropy regularizer. Namely, we define the regularizer

\displaystyle R(M) = c \sum_x M(x) \log M(x)

for a choice of {c} that we will fix later. The corresponding Bregman divergence is

\displaystyle D(M,\hat M) = c KL(M,\hat M) = c \cdot \left( \sum_x M(x) \log \frac {M(x)}{\hat M(x)} - \sum_x M(x) + \sum_x \hat M(x) \right)

and we work over the space {\cal K} of distributions of min-entropy {\geq \log_2 2\delta |X|}.

The agile online mirror descent algorithm is

\displaystyle M_1 = \arg\min_{M\in {\cal K}} R(M)

so that {M_1} is the uniform distribution, and for {i\geq 1}

\displaystyle \hat M_{i+1} = \arg\min_{M: X \rightarrow {\mathbb R}} \ \ D(M_i,M) + L_i (M)

\displaystyle M_{i+1} = \arg\min_{M \in {\cal K}} \ \ D(M, \hat M_{i+1} )

Solving the first step of agile online mirror descent, we have

\displaystyle \hat M_{i+1} (x) = M_i(x) e^{-\frac 1c I \{ f(x) = g(x) \} }

Using the analysis from the previous post, for every distribution {M} in {\cal K}, and every number {T} of steps, we have the regret bound

\displaystyle \sum_{i=1}^T L_i(M_i) - L_i(M) \leq D(M,M_1) + \sum_{i=1}^T D(M_i, \hat M_{i+1} )

and we can bound

\displaystyle D(M,M_1) = c \sum_x M(x) \cdot \ln |X| \cdot M(x) \leq c \ln \frac 1 {2\delta}

and

\displaystyle D(M_i,\hat M_{i+1}) = c\cdot \left( \sum_x M_i(x) \ln \frac{M_i(x)}{\hat M_{i+1} (x) } + \sum_x \hat M_{i+1}(x) - M_i(x) \right)

\displaystyle = c \cdot \sum_x M_i(x) \cdot \left( \frac 1c I\{ f(x) = g(x) \} + e^{-\frac 1 cI \{ f(x) = g(x)\}} -1 \right)

\displaystyle \leq O \left( \frac 1c \right)

where, in the last step, we used the fact the quantity in parenthesis is either 0 or {1/c + e^{-1/c} - 1} which is {O(1/c^2)}, and that {\sum_x M_i(x) = 1} because {M_i} is a distribution.

Overall, the regret is bounded by

\displaystyle \sum_{i=1}^T L_i(M_i) - L_i(M) \leq O \left( c\log \frac 1\delta + \frac Tc \right) \leq O \left( \sqrt{T \log \frac 1\delta}\right)

where the last inequality comes from an optimized choice of {c}.

Recall that we choose the functions {f_i} so that {L_i(M_i) \geq 1/2 + \epsilon} for every {i}, so for every {M\in {\cal K}}

\displaystyle \frac 1T \sum_{i=1}^T L_i (M) \geq \frac 12 + \epsilon - O (\left( \sqrt{\frac 1 T \log \frac 1\delta}\right)

and by choosing {T} of the order of {\frac 1 {\epsilon^2} \log \frac 1 \delta} we get

\displaystyle \forall M \in {\cal K} : \ \ \ \frac 1T \sum_{i=1}^T L_i (M) > \frac 12

It remains to observe that

\displaystyle \frac 1T \sum_{i=1}^T L_i (M) = \frac 1T \sum_{i=1}^T \Pr_{x\sim M} [f_i(x) = g(x) ] = \Pr_{i \sim \{1,\ldots,T\}, \ x\sim M} [ f_i (x) = g(x) ]

so we have that for every distribution {M} of min-entropy at least {\log_2 2\delta |X|} it holds that

\displaystyle \Pr_{i \sim \{1,\ldots,T\}, \ x\sim M} [ f_i (x) = g(x) ]> \frac 12

which is the statement that we promised and from which the Impagliazzo Hard-Core Lemma follows.

4. Some Final Remarks

After Impagliazzo circulated a preliminary version of his paper, Nisan had the following idea: consider the game that we define above, in which a builder picks an {f\in {\cal F}}, an inspector picks a distribution {M \in {\cal K}} of the prescribed min-entropy, and the loss for the inspector is given by {\Pr [ f(x) = g(x) ]}. We can think of it as a zero-sum game if we also assign a gain {\Pr [ f(x) = g(x)]} to the builder.

If the builder plays second, there is a strategy that guarantees a gain that is at least {1/2 + \epsilon}, and so there must be a mixed strategy, that is, a distribution {{\cal DF}} over functions in {\cal F}, that guarantees such a gain even if the builder plays first. In other words, for all distributions {H} of the prescribed min-entropy we have

\displaystyle \Pr_{x\sim H, f\sim {\cal DF}} [ f(x) = g(x) ] \geq \frac 12 + \epsilon

Nisan then observes that we can sample {T = \frac 1{\epsilon^2} \log |X|} functions {f_1,\ldots,f_T} and have, with high probability

\displaystyle \Pr_{x\sim H, i\sim \{1,\ldots,T\}} [ f_i(x) = g(x) ] > \frac 12

and the sampling bound on {T} can be improved to order of {\frac 1 {\epsilon^2} \log \frac 1{\epsilon \delta}} with the same conclusion.

Basically, what we have been doing today is to come up with an algorithm that finds an approximate solution for the LP that defines the optimal mixed strategy for the game, and to design the algorithm is such a way that the solution is very sparse.

This is a common feature of other applications of online optimization techniques to find “sparse approximations”: one sets up an optimization problem whose objective function measures the “approximation error” of a given solution. The object we want to approximate is the optimum of the optimization problem, and we use variants of mirror descent to prove the existence of a sparse solution that is a good approximation.

6 thoughts on “Online Optimization Post 6: The Impagliazzo Hard-Core Set Lemma

  1. Is the upper bound on k known to be sharp in any sense?

    Also, I might have found some typos:
    “pick a distribution M_i” – in the later parts often M is written instead of M_i, like in L_i(M)

    “Solving the first step of agile online mirror descent” – in the equation after this, and in a later equation as well, f should be f_i in I{f(x)=g(x)}

  2. Indeed, you are right about M_i.

    I would have a question about Nisan’s game theoretic argument. Why should there be such a DF distribution? For example, consider the rock-paper-scissors variant when builder has to show the same thing as the inspector. If inspector plays first, and builder plays second, builder always wins. But this doesn’t give any mixed strategy for the builder when builder plays first, in fact, in that case always inspector wins.

  3. By “playing first” I mean deciding on a (possibily mixed) strategy, and then letting the second player choose their (possibly mixed) strategy based on that. If players choose mixed strategies, then the sampling from the mixed strategies happens independently. This is the setup in which you have the min-max theorem for two player zero sum games. Something that is important to make the argument work is that a mix of distributions of min entropy > K is also a distribution of min entropy > K

  4. Pingback: Online Optimization Post 7: Matrix Multiplicative Weights Update | in theory

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s