Boosting and the Regularity Lemma

In a previous post, I described abstract forms of the weak regularity lemma, in which we start from an arbitrary bounded function $g: X \rightarrow [-1,1]$ and an arbitrary set $F$ of “structured” bounded functions $f: X \rightarrow [-1,1]$ , and we construct an “approximatng” function $h$ that has “low complexity” relative to $F$ and is “indistinguishable” from $g$ by functions from $F$ . We had two proofs, both generalizations of arguments due to Frieze and Kannan: in one proof $h$ is not bounded, but its complexity is $O(\epsilon^{-2})$ , where $\epsilon$ is the approximation parameter; in the second proof, $h$ is bounded but it has exponential complexity $exp(O(\epsilon^{-2})$ in the approximation parameter.

In a new paper by Madhur Tulsiani, Salil Vadhan, and I, we give a new proof that gives an approximating function that, at the same time, is bounded and has low complexity. This has a number of applications, which I will describe in a future post.

(Note that, in the statement below, $F$ is required to contain Boolean functions, rather than bounded ones. It is possible, however, to “reduce” the bounded case to the boolean case via the following observation: if $F$ is a family of bounded functions, $F'$ is the family of boolean functions $\{ 1_{f(x) \geq t} \ : \ f\in F, t\in [-1,1] \}$ , and $g$ and $h$ are $\epsilon$ -indistinguishable according to $F'$ , then they are also $\epsilon$ -indistinguishable according to $F$ .)

Theorem (Low Complexity Approximation, TTV Version) Let $(X,\mu)$ be a probability space, $g:X \rightarrow [-1,1]$ a bounded function, $F$ a collection of boolean functions $f:X \rightarrow \{ -1,1\}$ , and $\epsilon$ an approximation parameter.

Then there is a function $h: X \rightarrow [-1,1]$ such that

$h$ has low complexity relative to $F$ : there are $k= O( \epsilon^{-2})$ functions $f_i\in F$ and coefficients $c_i$ such that
$\displaystyle h(x) := \max \{ -1 , \min \{ 1, \sum_{i=1}^k c_i f_i (x) \}\}$
$h$ is $\epsilon$ -indistinguishable from $g$ by $F$ , that is,
$\forall f.F \ \ \left| {\mathbb E}_{x \sim \mu} f(x) g(x) - {\mathbb E}_{x \sim \mu} f(x) h(x) \right| \leq \epsilon$

That is, $h$ is simply a linear combination of functions from $F$ , whose value is truncated to be between $-1$ and $1$ .

Recall that, when we do not require $h$ to be bounded, $h$ can be constructed via the following algorithm:

Algorithm FK

$h_0 :=$ 0; $t= 0$
while such that
- $h_{t+1} := \sum_{i=1}^{t+1} \gamma f_{i}$
- $t:=t+1$
return $h_t$

And the analysis shows that, if we call $\Delta_t := g - h_t$ , then ${\mathbb E} \Delta_t^2 - \Delta_{t+1}^2 \geq 2\epsilon\gamma - \gamma^2$ . Setting $\gamma := \epsilon$ shows that the algorithm stops within $\epsilon^{-2}$ steps.

Our proof proceeds by doing exactly the same, but making sure at every step that $h$ is bounded.

Algorithm TTV

$h_0 :=$ 0; $t= 0$
while such that
- $h_{t+1} := \max \{ -1, \min \{ 1, \sum_{i=1}^{t+1} \gamma f_i \} \}$
- $t:=t+1$
return $h_t$

The problem with the old analysis is that now we could conceivably have a step in which $\sum_{i=1}^t \gamma f_i (x) > 1+\gamma$ for all $x$ , and so $h_t = h_{t+1}$ , and thus $\Delta_t = \Delta_{t+1}$ , and we have no energy decrease. To get to such a state, however, there must have been about $1/\gamma$ steps in which $h$ changed in the same way as in the Frieze-Kannan case. I think there should be a sort of “amortized analysis” that would “charge” the lack of change of $h$ at certain steps to other steps in which it indeed changes, and establish that, for every time step $T$ ,

${\mathbb E} \Delta_0^2 - \Delta_T^2 \geq \Omega(T \cdot \gamma \cdot (\epsilon - O(\gamma))$

Unfortunately I don’t know how to construct such a proof. Our proof, instead, follows step by step Impagliazzo’s proof of the Impagliazzo Hardcore Set Lemma, which employs a very similar algorithm (in a quite different context). As shown by Klivans and Servedio, the algorithm in Impagliazzo’s proof can be seen as a boosting algorithm in learning theory (and every other known boosting algorithm can be used to prove Impagliazzo’s Hardcore Lemma); this is why we think of our proof as a “boosting” proof.

I must confess, I have never really understood Impagliazzo’s proof, and so I can’t say I really understand ours, other than being able to reproduce the steps.

The idea is to consider the quantity $f_{t+1}(x) \Delta_{t}(x)$ , which depends on $t$ and $x$ , and to see how it behaves summed over $t$ , for a fixed $x$ , and how it behaves for a fixed $t$ , summed over $x$ .

Suppose the algorithm is still running after $T$ steps. Then, because of the way we define the termination condition, we have, for every $t \in \{ 0,\ldots, T-1\}$

$\displaystyle (1) \ \ \ {\mathbb E}_x f_{t+1}(x) (g(x)-h_t(x)) \geq \epsilon$

and the crux of the proof is to show that for every $x$ we have

$(2) \displaystyle \ \ \ \sum_{i=0}^{T-1} f_{t+1}(x) (g(x)-h_t(x)) \leq \frac \gamma 2 \cdot T + \frac 4 {\gamma}$

So if we sum (1) over $t$ and average (2) over $x$ , we get

$\displaystyle \epsilon T \leq \frac \gamma 2 \cdot T + \frac 4 {\gamma}$

and setting $\gamma := \epsilon$ gives $T \leq 8 \epsilon^{-2}$ .

Inequality (2), the main step of the proof, is the part for which I have little intuition. One breaks the summation into groups of time steps, depending on the value of $\Delta_t$ ; there are $2/\gamma$ groups (because the value of $h_t$ changes by discrete increments of $\gamma$ , and is between $-1$ and $+1$ ) and each one is shown to contribute $O(\gamma T') + O(1)$ , where $T'$ is the number of time steps in the group.

It is perhaps instructive to translate the Frieze-Kannan proof to this set-up. In the Frieze-Kannan algorithm, we have

$\displaystyle f_{t+1}(x) (g(x)-h_t(x)) = \frac 1 {2\gamma} \cdot \left( \Delta_{t}^2 - \Delta^2_{t+1} +\gamma^2 f_{t+1}^2 \right)$

and so

$\begin{array}{ll} \displaystyle \sum_{t=0}^{T-1} f_{t+1}(x) (g(x)-h_t(x)) & = \displaystyle \frac 1{2\gamma} (\Delta_0^2 - \Delta_{T}^2) + \frac {\gamma}2 \cdot \sum_{t=1}^{T} f_t^2(x) \\ & \leq \displaystyle \frac 1 {2\gamma} + \frac \gamma 2 \cdot T \end{array}$

which is analogous to our inequality (2).

in theory

"Marge, I agree with you – in theory. In theory, communism works. In theory." — Homer Simpson

Boosting and the Regularity Lemma

1 thought on “Boosting and the Regularity Lemma”

Leave a comment

Share this:

Related

1 thought on “Boosting and the Regularity Lemma”

Leave a comment