Boosting and the Regularity Lemma

In a previous post, I described abstract forms of the weak regularity lemma, in which we start from an arbitrary bounded function g: X \rightarrow [-1,1] and an arbitrary set F of “structured” bounded functions f: X \rightarrow [-1,1], and we construct an “approximatng” function h that has “low complexity” relative to F and is “indistinguishable” from g by functions from F. We had two proofs, both generalizations of arguments due to Frieze and Kannan: in one proof h is not bounded, but its complexity is O(\epsilon^{-2}), where \epsilon is the approximation parameter; in the second proof, h is bounded but it has exponential complexity exp(O(\epsilon^{-2}) in the approximation parameter.

In a new paper by Madhur Tulsiani, Salil Vadhan, and I, we give a new proof that gives an approximating function that, at the same time, is bounded and has low complexity. This has a number of applications, which I will describe in a future post.

(Note that, in the statement below, F is required to contain Boolean functions, rather than bounded ones. It is possible, however, to “reduce” the bounded case to the boolean case via the following observation: if F is a family of bounded functions, F' is the family of boolean functions \{ 1_{f(x) \geq t} \ : \ f\in F, t\in [-1,1] \}, and g and h are \epsilon-indistinguishable according to F', then they are also \epsilon-indistinguishable according to F.)

Theorem (Low Complexity Approximation, TTV Version) Let (X,\mu) be a probability space, g:X \rightarrow [-1,1] a bounded function, F a collection of boolean functions f:X \rightarrow \{ -1,1\}, and \epsilon an approximation parameter.

Then there is a function h: X \rightarrow [-1,1] such that

  • h has low complexity relative to F: there are k= O( \epsilon^{-2}) functions f_i\in F and coefficients c_i such that

    \displaystyle h(x) := \max \{ -1 , \min \{ 1, \sum_{i=1}^k c_i f_i (x)  \}\}

  • h is \epsilon-indistinguishable from g by F, that is,

    \forall f.F \ \ \left| {\mathbb E}_{x \sim \mu} f(x) g(x) - {\mathbb E}_{x \sim \mu} f(x) h(x) \right|  \leq \epsilon

That is, h is simply a linear combination of functions from F, whose value is truncated to be between -1 and 1.

Recall that, when we do not require h to be bounded, h can be constructed via the following algorithm:

Algorithm FK

  • h_0 := 0; t= 0
  • while \exists f_{t+1} \in F such that {\mathbb E} f_{t+1} \cdot (g-h_t) \geq \epsilon
    • h_{t+1} := \sum_{i=1}^{t+1} \gamma f_{i}
    • t:=t+1
  • return h_t

And the analysis shows that, if we call \Delta_t := g - h_t, then {\mathbb E} \Delta_t^2 - \Delta_{t+1}^2 \geq  2\epsilon\gamma - \gamma^2. Setting \gamma := \epsilon shows that the algorithm stops within \epsilon^{-2} steps.

Our proof proceeds by doing exactly the same, but making sure at every step that h is bounded.

Algorithm TTV

  • h_0 := 0; t= 0
  • while \exists f_{t+1} \in F such that {\mathbb E} f_{t+1} \cdot (g-h_t) \geq \epsilon
    • h_{t+1} := \max \{ -1, \min \{ 1, \sum_{i=1}^{t+1} \gamma f_i \} \}
    • t:=t+1
  • return h_t

The problem with the old analysis is that now we could conceivably have a step in which \sum_{i=1}^t \gamma f_i (x) > 1+\gamma for all x, and so h_t = h_{t+1}, and thus \Delta_t = \Delta_{t+1}, and we have no energy decrease. To get to such a state, however, there must have been about 1/\gamma steps in which h changed in the same way as in the Frieze-Kannan case. I think there should be a sort of “amortized analysis” that would “charge” the lack of change of h at certain steps to other steps in which it indeed changes, and establish that, for every time step T,

{\mathbb E} \Delta_0^2 - \Delta_T^2 \geq \Omega(T \cdot \gamma \cdot (\epsilon - O(\gamma))

Unfortunately I don’t know how to construct such a proof. Our proof, instead, follows step by step Impagliazzo’s proof of the Impagliazzo Hardcore Set Lemma, which employs a very similar algorithm (in a quite different context). As shown by Klivans and Servedio, the algorithm in Impagliazzo’s proof can be seen as a boosting algorithm in learning theory (and every other known boosting algorithm can be used to prove Impagliazzo’s Hardcore Lemma); this is why we think of our proof as a “boosting” proof.

I must confess, I have never really understood Impagliazzo’s proof, and so I can’t say I really understand ours, other than being able to reproduce the steps.

The idea is to consider the quantity f_{t+1}(x) \Delta_{t}(x), which depends on t and x, and to see how it behaves summed over t, for a fixed x, and how it behaves for a fixed t, summed over x.

Suppose the algorithm is still running after T steps. Then, because of the way we define the termination condition, we have, for every t \in \{ 0,\ldots, T-1\}

\displaystyle (1) \ \ \ {\mathbb E}_x f_{t+1}(x)  (g(x)-h_t(x)) \geq  \epsilon

and the crux of the proof is to show that for every x we have

(2) \displaystyle \ \ \ \sum_{i=0}^{T-1} f_{t+1}(x)  (g(x)-h_t(x)) \leq \frac \gamma 2 \cdot T  + \frac 4 {\gamma}

So if we sum (1) over t and average (2) over x, we get

\displaystyle  \epsilon T \leq  \frac  \gamma 2 \cdot T  + \frac 4 {\gamma}

and setting \gamma := \epsilon gives T \leq 8 \epsilon^{-2}.

Inequality (2), the main step of the proof, is the part for which I have little intuition. One breaks the summation into groups of time steps, depending on the value of \Delta_t; there are 2/\gamma groups (because the value of h_t changes by discrete increments of \gamma, and is between -1 and +1) and each one is shown to contribute O(\gamma T') + O(1), where T' is the number of time steps in the group.

It is perhaps instructive to translate the Frieze-Kannan proof to this set-up. In the Frieze-Kannan algorithm, we have

\displaystyle  f_{t+1}(x)  (g(x)-h_t(x))  = \frac 1 {2\gamma} \cdot \left( \Delta_{t}^2 - \Delta^2_{t+1} +\gamma^2 f_{t+1}^2 \right)

and so

\begin{array}{ll} \displaystyle  \sum_{t=0}^{T-1} f_{t+1}(x)  (g(x)-h_t(x)) & = \displaystyle \frac 1{2\gamma} (\Delta_0^2 - \Delta_{T}^2) + \frac {\gamma}2 \cdot \sum_{t=1}^{T} f_t^2(x) \\  & \leq \displaystyle  \frac 1 {2\gamma}  + \frac \gamma 2 \cdot  T \end{array}

which is analogous to our inequality (2).

About these ads

One thought on “Boosting and the Regularity Lemma

  1. Pingback: Applications of Low-Complexity Approximations « in theory

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s