At the conference I was at last week, I was asked by a mathematician about the exact statement of the Impagliazzo Hard-Core Lemma, and whether I could state it without saying “*algorithm*.”

So I gave the following statement:

Theorem 1 (Impagliazzo Hard-Core Lemma — Basic Version)Let be a class of boolean functions , let be an arbitrary function, and let be arbitrary parameters.Then at least one of the following conditions is true:

- is
globally well approximatedby a simple composition of a few functions from .That is, there are functions , and a “simple” function such that

- is
locally almost orthogonalto .That is there is a subset of density at least such that

for every,

The theorem has been strengthened in a couple of ways in work of Klivans and Servedio and Holenstein. The “complexity” parameter needs to be only which is conjectured (or known?) to be tight, and the density of the set can be made , which is tight. In all proofs is *really* simple, being more or less a threshold function applied to the sum of the .

Stating the result in this way (which is, of course, standard) raises a few questions that I find rather interesting.

** Lower Bounds on ? **

I don’t know if a lower bound on is known, but I think it’s helpful to see why the statement would be false if we tried to approximate with *one* function from in the first case.

Take to be a random function, pick an arbitrary partition of into three subsets , and define such that on and on . Pick . Now we see that there is no that agrees with on a fraction of inputs. On the other hand, no matter how we pick a subset of density of , there is at least a function that agrees with on on at least a fraction of inputs. (Note that the sets are a partition of , and at least one of the three must have size less than .)

If I am not mistaken, this argument yields a lower bound for constant . How does one argue an lower bound?

** From Boolean Functions to Bounded Functions **

When one abstracts complexity theory results as we have just done, as results about classes of boolean functions, usually the same proofs also give more general results about *bounded* functions, which correspond to probabilistic algorithms.

A scaling argument suggests that the analog of the Lemma for bounded functions would be that for every set of of functions of , every function , and every parameters , at least one of the following holds:

- there are functions and a combining function such that
- there is a subset of density at least such that
*for every*,

But this turns out to be false. Take to be a random function and take to be any class of functions and to be parameters such that (1) is not true even for large (for example, take to be class of functions computable by size circuits and to be inverse polynomial in ); now note that taking has inner product at least with the constant-one function on every subset .

Is there a nice way to generalize the Hard-Core Lemma to bounded functions? (Partial answer in the next section.)

** Explicit **

When formulating complexity-theoretic results in this abstract form, I often find it useful to see what happens if I instantiate the class of functions that are meant to represent efficient algorithms with the class of Fourier characters of . Typically, one obtains a basic fact from Fourier analysis, but with an unusual proof that makes use of no properties of the characters other than they are a family of functions, which can be nice to see. Sometimes, one obtains a statement that is not trivial, and presenting the proof in this special case can be a nice way to introduce a complexity-theoretic result to mathematicians.

Here, if you try this experiment, you get something that is not standard, but also not very meaningful: that a boolean functions is either well approximating by a function of a few characters (that is, a function that is constant on the cosets of a rather large subspace), or else there is a large subset on which the function has very low correlation with all characters. The first case is an interesting one, but the second one sounds strange, because we don’t know anything about the set.

It would be nicer if we could say: either is well approximated by a function of few characters, or there is a *subspace* such that is nearly orthogonal to all characters when restricted to . This would correspond to a version of the Hard-Core Lemma in which the characteristic function of the set is itself a simple combinations of functions from . Unfortunately, this stronger statement is false in general, and it is even false in the special case of the Fourier analysis application: take to be 1 with probability and with probability , and you see that it is not approximable, but also that the constant-one character has large correlation in all large subspaces.

Maybe we could say that either is globally approximable, or else there is a large subspace such that restricted to has low correlation with all *non-constant* characters.

In fact, I think that the following statement is true via the same density-increment argument we have in this paper with Reingold, Tulsiani and Vadhan, where we prove a slightly different version:

Theorem 2 (Impagliazzo Hard-Core Lemma — Constructive )Let be a class of boolean functions , let be an arbitrary function, and let be arbitrary parameters.Then at least one of the following conditions is true:

- is
globally well approximatedby a simple composition of a few functions from .That is, there are functions , and a function such that

- is
locally almost orthogonalto after a shift.That is there is a subset of density at least , such that for , , and for function , such that

for every,

The only difference between the above statement and the one in the RTTV paper, Lemma 3.1 is that there Case (2) has a weaker form equivalent to

but, unless I made a mistake in the calculations I just did, the same proof works for the Case (2) in the theorem stated above.

In fact I think that this statement would also work for general bounded functions, reformulating Case (1) to require distance at most instead of agreement at least .

While is polynomial in and , the function might have exponential complexity in in the only proof I know of of Theorem 2, so the results discussed in this section have limited applicability in complexity theory. Is it possible to prove a constructive hard-core lemma in which the combining function has polynomial circuit complexity in and ?

Holenstein proves a constructive hard-core lemma in which is samplable if the distribution is samplable, and in a paper with Tulsiani and Vadhan we show that the characteristic function of can be made efficiently computable given oracle access to , but the question I would like to raise is whether, as in the above statement, one can have fully polynomially computable for an arbitrary to which we have no access of any kind, but with the “shifted inner product” condition in Case (2).

I believe Lu, Tsai and Wu gave a lower bound on k in

“On the Complexity of Hard-Core Set Constructions”, ICALP ’07

http://www.springerlink.com/content/e884424t02lw0572/

How about the question of finding an explicit set f_1,..,f_k (and h), given that the second condition does not hold (i.e., for every dense enough set H, there is some f \in F, which has \epsilon advantage in predicting better than random)?

Can’t Boosting be thought of as a constructive method for doing this (given an oracle for picking an \epsilon-good f for any H)?

(Except that Boosting works with distributions that are not flat over H…)

Not really, because boosting gives you a way to sample H if you can sample from (x,g(x)), which is the (useful!) setup in which Holenstein’s proof is constructive.

I would like the notion of constructivity in which the characteristic function (or the pointwise probability of elements of H, in the non-flat case) is efficiently computable with no access to g whatsoever. This is provably impossible for the standard definition of hard-core set, but it’s plausible for weaker definitions, and true for weaker definitions such as the “shifted low correlation” in this post or “no advantage over constant functions” in the RTTV paper, but with exponential complexity in and .

Hi Luca,

Freund’s boosting-by-majority paper has a lower bound of 1/epsilon^2 log(1/delta) on the number of boosting stages. It probably can be adapted to the hard-core lemma setting.

Also you seem to suggest that Holenstein’s hard-core set lemma is more constructive than say Impagliazzo’s original proof (I might have also seen this claim in Barak-Hardt-Kale paper). I wonder in which sense it is more constructive.

As far as I understand in either of those proofs H is efficiently samplable given random samples from (x,g(x)). This is also true about any hard-core set construction obtained from an efficient boosting algorithm (such as those given by Klivans and Servedio). Just to avoid confusion I’m referring to Impagliazzo’s construction based on incremental updates and not Nisan’s Min-max based proof of existence (also found in Impagliazzo’s paper).

Hi Vitaly, that’s a good point, probably Impagliazzo’s iterative proof too can be made constructive in the sense that H is samplable if (x,g(x)) is samplable. Holenstein was the first to make this claim explicit