Several results in additive combinatorics have the flavor of decomposition results in which one shows that an arbitrary combinatorial object can be approximated by a “sum” of a “low complexity” part and a “pseudorandom” part. This is helpful because it reduces the task of proving that an arbitrary object has a certain property to the easier task of proving that low-complexity objects and pseudorandom objects have the property. (Provided that the property is preserved by the approximation and the sum.)
Many examples of such results are given in the paper accompanying Terry Tao’s tutorial at FOCS’07.
Usually, one needs a strong notion of approximation, and in such cases the complexity of the “low complexity” part is fantastically large (a tower of exponentials in the approximation parameter), as in the Szemeredi Regularity Lemma, which can be seen as a prototypical such “decomposition” result. In some cases, however, weaker notions of approximation are still useful, and one can have more reasonable complexity parameters, as in the Weak Regularity Lemma of Frieze and Kannan.
A toy example of a “weak decomposition result” is the following observation: let be the linear functions used as the basis for Fourier analysis of functions , let be a bounded function, and consider its Fourier transform
Then we can write
where
is the truncation of the Fourier expansion containing only the larger coefficient, and . Since we know , we can see that is defined as a linear combination of at most functions , and so it has “low complexity” relative to the linear functions.
The other observation is that, for every function , we have
So we have written as a sum of a function of “complexity” and of a “pseudorandom” function all whose Fourier coefficients are small. The reason we think of a function with small Fourier coefficients as “pseudorandom” is because (the uniform distribution over) a set is -biased (in the Naor-Naor sense) if and only if its indicator function has small Fourier coefficients. Another way to look at the decomposition is to see that we have found a function that has “small complexity” relative to the set of Fourier basis functions , and such that and are indistinguishable by basis functions, that is,
These are trivial observation, and they seem to depend essentially on the fact that the are orthogonal. Indeed, this is not the case.
By modifying the main idea in the Frieze-Kannan paper, it is possible to prove the following result.
Theorem 1 (Low Complexity Approximation, Simple Version) Let be a probability space, a bounded function, a collection of bounded functions , and an approximation parameter.
Then there is a function such that
- has low complexity relative to :
where , , and the coefficients satisfy ;
- is -indistinguishable from by :
We can see that the Fourier analysis approximation we discussed above is the special case in which is the set of linear functions.
The proof is surprisingly simple. Assume without loss of generality that is “closed under negation,” that is, if then . (Otherwise we apply the following argument to the closure of .) We construct the function via the following algorithm.
- 0;
- while such that
If the algorithm terminates after a finite number of steps, then the function is indistinguishable as required, and we have , so we just need to argue that the algorithm always terminates, and it does so within steps.
We prove this via an energy decreasing argument. Consider the error at time , , and define the “energy” at time to be . At time zero, . It remains to show that the energy decreases by at least at every step. (The energy is always non-negative, so if it starts at 1 and it decreases by at least at every step, there can be no more than steps.)
To see how Theorem 1 relates to the regularity lemma on graphs, let be the set of edges in the complete graph over vertices, and let be an arbitrary graph over vertices, which we shall think of as a function . For two disjoint sets of vertices , let be the indicator function of the set of edges with one endpoint in and one endpoint in , let be the set of such functions, and let be the uniform distribution over . Then, for every , Theorem 1 gives us an approximating weighted graph such that
- is a weighted sum of functions
- For every two sets of vertices , the number of edges in between and differs from the (weighted) number of edges in between and by at most . (Indeed, at most .)
The model graph is helpful because one can solve several optimization problems on it (such as Max Cut) in time dependent only on , and then transfer such solutions to , paying an additive cost of (which is negligible if is dense).
Regularity Lemmas, however, are usually stated in terms of partitions of the sets of vertices, not in terms of linear combinations. Also, it is somewhat undesirable that the approximating graph has weights that are not guaranteed to be bounded. Analogously, in Theorem 1, it would be desirable for to be bounded, which is not guaranteed by the construction. (Although one can see that , the best bound we can give about individual inputs is .)
To deal with this issue, we first need a couple of definitions.
If is a partition of , and is any function, then the conditional expectation of on is the function defined as
that is, on input , belonging to block of the partition , the value of is the expectation of on the block . To visualize this definition, suppose that elements of are people, are countries, and is the wealth of person . Now, spread around the wealth evenly within each country; then is the post-spreading-around wealth of .
The following definition will simplify some discussions on indistinguishability: if is a probability space, is a collection of functions and is any function, then
Note that this is indeed a norm, and that and are -indistinguishable by if and only if .
Given these definitions, Frieze and Kannan prove that, in the special case in which are edges of a complete graph and is the collection of indicator functions of cuts, if is a partition of defined by first picking functions and then creating the coarsest partition such that every function is constant on each block of the partition (we call the partition generated by ), and if is any function and is a function which is constant on each block of , then
From this, we have that if is the function of Theorem 1, and is the partition defined by the , then
And so the function is also a good approximation for . Nicely, is precisely the type of approximation described in the standard regularity lemma, and all its edge weights are between 0 and 1. I don’t know how to prove (2) in general, but the iterative partitioning proof of the Weak Regularity Lemma can be abstracted into the following result.
Theorem 2 (Low Complexity Approximation, Partition Version) Let be a probability space, a bounded function, a collection of boolean functions , and an approximation parameter.
Then there is a function such that
- has low complexity relative to : there are functions such that , where is the partition generated by ;
- is -indistinguishable from by , that is,
The proof proceeds by starting with being the trivial partition containing only one block; we continue until we have a partition such that . If the current partition is such that then we add the function such that to the collection of functions used to define . Every time we thus refine , we can show that increases by , and it is always at most one, so we are able to bound the number of steps.
Note that, in Theorems 1 and 2, we do not need to be finite; indeed, can be any measure space such that , provided we interpret the notation to mean . (In such cases, we shall also need to make proper integrability assumptions on and the functions in .) From Theorems 1 and 2 we can reconstruct the “Analytic Weak Regularity Lemmas” of Lovasz and Szegedy. For example, if , and is the class of functions of the form , over all measurable sets , then the norm is the same as the norm defined in Section 4 of the Lovasz-Szegedy paper, and their Lemma 7 follows from our Theorem 1.
For a complexity-theoretic interpretation, note that, in Theorem 2, we can take to be the class of functions computable by circuits of size at most ; then the theorem gives us an approximating function which can be computed in size .
Could we make the complexity of the approximation polynomial in , while retaining the useful property that the approximating function is bounded? This is the subject of a new paper by Madhur Tulsiani, Salil Vadhan, and I, and also the subject of a forthcoming post.
Pingback: Boosting and the Regularity Lemma « in theory
Why the complexity of $h$ is Theorem 2 is $S.exp(O(\epsilon ^{-2})$ ?
I don’t see a way other than going over all X and computing all f’s to find out the partition the input is in, and then computing f over all of that domain and taking the average 😦