Boosting and the Regularity Lemma

In a previous post, I described abstract forms of the weak regularity lemma, in which we start from an arbitrary bounded function g: X \rightarrow [-1,1] and an arbitrary set F of “structured” bounded functions f: X \rightarrow [-1,1], and we construct an “approximatng” function h that has “low complexity” relative to F and is “indistinguishable” from g by functions from F. We had two proofs, both generalizations of arguments due to Frieze and Kannan: in one proof h is not bounded, but its complexity is O(\epsilon^{-2}), where \epsilon is the approximation parameter; in the second proof, h is bounded but it has exponential complexity exp(O(\epsilon^{-2}) in the approximation parameter.

In a new paper by Madhur Tulsiani, Salil Vadhan, and I, we give a new proof that gives an approximating function that, at the same time, is bounded and has low complexity. This has a number of applications, which I will describe in a future post.

(Note that, in the statement below, F is required to contain Boolean functions, rather than bounded ones. It is possible, however, to “reduce” the bounded case to the boolean case via the following observation: if F is a family of bounded functions, F' is the family of boolean functions \{ 1_{f(x) \geq t} \ : \ f\in F, t\in [-1,1] \}, and g and h are \epsilon-indistinguishable according to F', then they are also \epsilon-indistinguishable according to F.)

Theorem (Low Complexity Approximation, TTV Version) Let (X,\mu) be a probability space, g:X \rightarrow [-1,1] a bounded function, F a collection of boolean functions f:X \rightarrow \{ -1,1\}, and \epsilon an approximation parameter.

Then there is a function h: X \rightarrow [-1,1] such that

  • h has low complexity relative to F: there are k= O( \epsilon^{-2}) functions f_i\in F and coefficients c_i such that

    \displaystyle h(x) := \max \{ -1 , \min \{ 1, \sum_{i=1}^k c_i f_i (x)  \}\}

  • h is \epsilon-indistinguishable from g by F, that is,

    \forall f.F \ \ \left| {\mathbb E}_{x \sim \mu} f(x) g(x) - {\mathbb E}_{x \sim \mu} f(x) h(x) \right|  \leq \epsilon

That is, h is simply a linear combination of functions from F, whose value is truncated to be between -1 and 1.

Recall that, when we do not require h to be bounded, h can be constructed via the following algorithm:

Algorithm FK

  • h_0 := 0; t= 0
  • while \exists f_{t+1} \in F such that {\mathbb E} f_{t+1} \cdot (g-h_t) \geq \epsilon
    • h_{t+1} := \sum_{i=1}^{t+1} \gamma f_{i}
    • t:=t+1
  • return h_t

And the analysis shows that, if we call \Delta_t := g - h_t, then {\mathbb E} \Delta_t^2 - \Delta_{t+1}^2 \geq  2\epsilon\gamma - \gamma^2. Setting \gamma := \epsilon shows that the algorithm stops within \epsilon^{-2} steps.

Our proof proceeds by doing exactly the same, but making sure at every step that h is bounded.

Algorithm TTV

  • h_0 := 0; t= 0
  • while \exists f_{t+1} \in F such that {\mathbb E} f_{t+1} \cdot (g-h_t) \geq \epsilon
    • h_{t+1} := \max \{ -1, \min \{ 1, \sum_{i=1}^{t+1} \gamma f_i \} \}
    • t:=t+1
  • return h_t

The problem with the old analysis is that now we could conceivably have a step in which \sum_{i=1}^t \gamma f_i (x) > 1+\gamma for all x, and so h_t = h_{t+1}, and thus \Delta_t = \Delta_{t+1}, and we have no energy decrease. To get to such a state, however, there must have been about 1/\gamma steps in which h changed in the same way as in the Frieze-Kannan case. I think there should be a sort of “amortized analysis” that would “charge” the lack of change of h at certain steps to other steps in which it indeed changes, and establish that, for every time step T,

{\mathbb E} \Delta_0^2 - \Delta_T^2 \geq \Omega(T \cdot \gamma \cdot (\epsilon - O(\gamma))

Unfortunately I don’t know how to construct such a proof. Our proof, instead, follows step by step Impagliazzo’s proof of the Impagliazzo Hardcore Set Lemma, which employs a very similar algorithm (in a quite different context). As shown by Klivans and Servedio, the algorithm in Impagliazzo’s proof can be seen as a boosting algorithm in learning theory (and every other known boosting algorithm can be used to prove Impagliazzo’s Hardcore Lemma); this is why we think of our proof as a “boosting” proof.

I must confess, I have never really understood Impagliazzo’s proof, and so I can’t say I really understand ours, other than being able to reproduce the steps.

The idea is to consider the quantity f_{t+1}(x) \Delta_{t}(x), which depends on t and x, and to see how it behaves summed over t, for a fixed x, and how it behaves for a fixed t, summed over x.

Suppose the algorithm is still running after T steps. Then, because of the way we define the termination condition, we have, for every t \in \{ 0,\ldots, T-1\}

\displaystyle (1) \ \ \ {\mathbb E}_x f_{t+1}(x)  (g(x)-h_t(x)) \geq  \epsilon

and the crux of the proof is to show that for every x we have

(2) \displaystyle \ \ \ \sum_{i=0}^{T-1} f_{t+1}(x)  (g(x)-h_t(x)) \leq \frac \gamma 2 \cdot T  + \frac 4 {\gamma}

So if we sum (1) over t and average (2) over x, we get

\displaystyle  \epsilon T \leq  \frac  \gamma 2 \cdot T  + \frac 4 {\gamma}

and setting \gamma := \epsilon gives T \leq 8 \epsilon^{-2}.

Inequality (2), the main step of the proof, is the part for which I have little intuition. One breaks the summation into groups of time steps, depending on the value of \Delta_t; there are 2/\gamma groups (because the value of h_t changes by discrete increments of \gamma, and is between -1 and +1) and each one is shown to contribute O(\gamma T') + O(1), where T' is the number of time steps in the group.

It is perhaps instructive to translate the Frieze-Kannan proof to this set-up. In the Frieze-Kannan algorithm, we have

\displaystyle  f_{t+1}(x)  (g(x)-h_t(x))  = \frac 1 {2\gamma} \cdot \left( \Delta_{t}^2 - \Delta^2_{t+1} +\gamma^2 f_{t+1}^2 \right)

and so

\begin{array}{ll} \displaystyle  \sum_{t=0}^{T-1} f_{t+1}(x)  (g(x)-h_t(x)) & = \displaystyle \frac 1{2\gamma} (\Delta_0^2 - \Delta_{T}^2) + \frac {\gamma}2 \cdot \sum_{t=1}^{T} f_t^2(x) \\  & \leq \displaystyle  \frac 1 {2\gamma}  + \frac \gamma 2 \cdot  T \end{array}

which is analogous to our inequality (2).


Decomposition Results and Regularity Lemmas

Several results in additive combinatorics have the flavor of decomposition results in which one shows that an arbitrary combinatorial object can be approximated by a “sum” of a “low complexity” part and a “pseudorandom” part. This is helpful because it reduces the task of proving that an arbitrary object has a certain property to the easier task of proving that low-complexity objects and pseudorandom objects have the property. (Provided that the property is preserved by the approximation and the sum.)

Many examples of such results are given in the paper accompanying Terry Tao’s tutorial at FOCS’07.

Usually, one needs a strong notion of approximation, and in such cases the complexity of the “low complexity” part is fantastically large (a tower of exponentials in the approximation parameter), as in the Szemeredi Regularity Lemma, which can be seen as a prototypical such “decomposition” result. In some cases, however, weaker notions of approximation are still useful, and one can have more reasonable complexity parameters, as in the Weak Regularity Lemma of Frieze and Kannan.

A toy example of a “weak decomposition result” is the following observation: Continue reading