In this post we will construct a “spectral sparsifier” of a given hypergraph in a way that is similar to how Spielman and Srivastava construct spectral graph sparsifiers. We will assign a probability to each hyperedge, we will sample each hyperedge with probability , and we will weigh it by if selected. We will then bound the “spectral error” of this construction in terms of the supremum of a Gaussian process using Talagrand’s comparison inequality and finally bound the supremum of the Gaussian process (which will involve matrices) using matrix Chernoff bounds. This is joint work with Nikhil Bansal and Ola Svensson.
1. Hypergraph Sparsifiers
An (undirected) hypergraph is a collection of subsets , called hyperedges. A graph is the special case in which each set has cardinality 2. For simplicity we will talk only about -uniform hypergraphs, that is hypergraphs in which all hyperedges have the same cardinality (the arguments below would work also in the non-uniform case in which all hyperedges have cardinality at most ). Hyperedges may have weights.
If is a subset of vertices, a hyperedge is cut by if has non-empty intersection with both and . We call the number of hyperedges of cut by , or the total weight of such hyperedges if is a weighted hypergraph.
We can then generalize the notion of Benczur-Karger sparsification, and say that is a cut sparsifier of with error if and have the same vertex set and
Kogan and Krauthgamer show that every -uniform hypergraph admits a cut sparsifier of error with only weighted hyperedges. They are able to extend the argument of Benczur and Karger to hypergraphs, including arguing that there are few sparse cuts. They assign a probability to each hyperedge, sample it with that probability, weighing it if selected, and then use a union bound and Chernoff bounds to argue that all cuts are preserved.
Anand Louis has introduced a notion of hypergraph Laplacian in the following way. The Laplacian quadratic form of a hypergraph is a function defined as
where is the weight of the hyperedge , or 1 if the hypergraph is unweighted.
This definition has the motivation that it recovers the Laplacian quadratic form in the case of graphs, and that if is the indicator of a set then . Furthermore, one can define “eigenvalues” and “eigenvectors” of this hypergraph Laplacian and recover a Cheeger inequality and even higher-order Cheeger inequalities.
So it seems interesting to consider the following notion of sparsification: a hypergraph is a spectral sparsifier of with error if and have the same vertex set and
where, as before, the convention is that .
Soma and Yoshida studied this question and gave a construction with hyperedges. In the rest of this post we will discuss the construction by Nikhil Bansal, Ola Svensson and me, which uses hyperedges.
2. Choosing Probabilities
Given a hypergraph , we can construct a multigraph by taking each hyperedge of and then constructing, in , a clique between the vertices of . Thus, in , the edge is repeated as many times (possibly, zero times) as the number of hyperedges of that contain both and . Another way to think about it is that the Laplacian of is given by
where the inner sum is over the unordered pairs . This graph relates in several interesting ways with . For example, if is a subset of vertices and is a hyperedge that is cut by , then between and of the edges of derived from are cut by . If is not cut by , then none of the edges derived from is cut by , so we have
and, with a bit more work, it is possible to prove similar bounds for the Laplacian quadratic forms
This means that if we sparsify , say with the Spielman-Srivastava construction, we obtain information about , up to multiplicative error . Now suppose that, as we sample edges of to sparsify it in the Spielman-Srivastava way, we will also pick hyperedges of (for example we pick if at least one of its corresponding edges in is picked), and we weigh them by the inverse of the probability of being selected. Then we may hope that if the Spielman-Srivastava sparsification of is tuned to achieve error , the hypergraph that we obtain will have error at most . Indeed, this is roughly what happens, and we will be able to prove it by showing that the error in the hypergraph sparsification is dominated by the error in the “Gaussian version” of Spielman-Srivastava described in the previous post.
So we are going to assign to each hyperedge a probability
where the factor will be chosen later to get the construction to work and is the effective resistance of the edge of .
In fact, as before, it will be helpful to have probabilities that are non-positive powers of two, so we will choose to be a power of two such that
Another fact that will become useful later (it will save us a factor of the order of in the number of hyperedges in the construction) is that
3. A Discrete Random Process
Our construction of a hypergraph sparsifier will be to select each independently with probability and, if selected, weigh it by . Our goal is to find an upper bound in probability, or even in expectation, on
Recalling that , we will study
Because if we can show that the above quantity is at most , then, for every ,
If we define
then we are interested in the supremum in of
where is a random variable that is equal to with probability and is equal to with probability .
As before, we will do the construction in rounds. If the smallest probability is , then we will have rounds. We start with and then, at round , we take all hyperedges such that and, independently for each such hyperedge, we either delete it or we double its weight (with probability 1/2 in each case). The final hypergraph is distributed like . We have the processes
where the random variables are Rademacher, the weights are either 0 or , and is the set of edges that are “active” in round , that is, the set of hyperedges such that .
For each hypergraph , we will also consider its associated graph , obtained by replacing each hyperedge with a clique. The Laplacian of is
We have the following lemma
Lemma 1 For every outcome of the first rounds such that , there is a probability at least over the randomness of the -th round that
This means that we can take and apply the above lemma inductively to argue that we have a high probability that , and so we get a hypergraph spectral sparsifier with error and hyperedges.
To prove the Lemma we will define the Gaussian process
where the are Gaussian. Notice the two differences between and : we replaced the Rademacher choices with Gaussian choices, and we replaced
Furthermore, we are doing a random choice for each pair within each hyperedge instead of just one random choice per hyperedge.
Fact 2 The random processes and are -dominated by .
Proof: For every we have
To complete the proof, we argue that
To verify the above inequality, assume , otherwise the argument will be symmetric, and call the maximizer for and the maximizer for . Then
It remains to estimate the expected sup
and the diameter
where, recall, .
With the usual change of variable we have
where, as before, we use the notation , and .
By matrix Chernoff bounds for Gaussian sums of matrices,
Recall that if is a hyperedge that is active at round , then and , and we also have
so that we have
The last term of (1) is the spectral norm of , which is at most 2 by the assumption of the Lemma.
Collecting all the pieces, we have proved that
We also need to bound the diameter of , that is
under the usual change of basis, for every of length 1 we want to bound the square root of
where is either zero or and we can continue our chain of inequalities with
where we used again the assumption of the Lemma .
To prove the last part of the lemma, we have to prove that with high probability
We see that
as noted before, each is either zero or , so
and the final claim follows from matrix Chernoff bounds.
Now we can put everything together. Applying our lemma inductively, we can say that with high probability
provided that we choose at least at absolute constant times .
In particular, we can choose to be an absolute constant times and have
which is the same as
So, with high probability, is a spectral sparsifier of with error , and it has hyperedges
4. Some Additional Thoughts
The dependence on is definitely not optimal, particularly because when is order of we know from the work of Soma and Yoshida that we can do better. One place where we seem to lose is that, although we know
we are only able to show that
is -dominated by
rather than, as we could have hoped, -dominated. The difficulty is that the bound
is sometimes tight. In order to do better, it seems necessary to do something a bit differently from what we do here.
The effective resistance of an edge in a graph can be written as
with the convention that . So it seems reasonable that a good definition of effective resistance for an hyperedge in a hypergraph would be
One can argue that these “effective resistances” add up to , but perhaps they add up to ? If we sample according to those “effective resitances,” can we apply generic chaining directly to without having to rely on a Gaussian process on matrices, for which we have Chernoff bounds?