In this post we will construct a “spectral sparsifier” of a given hypergraph in a way that is similar to how Spielman and Srivastava construct spectral graph sparsifiers. We will assign a probability to each hyperedge, we will sample each hyperedge
with probability
, and we will weigh it by
if selected. We will then bound the “spectral error” of this construction in terms of the supremum of a Gaussian process using Talagrand’s comparison inequality and finally bound the supremum of the Gaussian process (which will involve matrices) using matrix Chernoff bounds. This is joint work with Nikhil Bansal and Ola Svensson.
1. Hypergraph Sparsifiers
An (undirected) hypergraph is a collection
of subsets
, called hyperedges. A graph is the special case in which each set
has cardinality 2. For simplicity we will talk only about
-uniform hypergraphs, that is hypergraphs in which all hyperedges have the same cardinality
(the arguments below would work also in the non-uniform case in which all hyperedges have cardinality at most
). Hyperedges may have weights.
If is a subset of vertices, a hyperedge
is cut by
if
has non-empty intersection with both
and
. We call
the number of hyperedges of
cut by
, or the total weight of such hyperedges if
is a weighted hypergraph.
We can then generalize the notion of Benczur-Karger sparsification, and say that is a cut sparsifier of
with error
if
and
have the same vertex set
and
Kogan and Krauthgamer show that every -uniform hypergraph
admits a cut sparsifier of error
with only
weighted hyperedges. They are able to extend the argument of Benczur and Karger to hypergraphs, including arguing that there are few sparse cuts. They assign a probability
to each hyperedge, sample it with that probability, weighing it
if selected, and then use a union bound and Chernoff bounds to argue that all cuts are preserved.
Anand Louis has introduced a notion of hypergraph Laplacian in the following way. The Laplacian quadratic form of a hypergraph is a function
defined as
where is the weight of the hyperedge
, or 1 if the hypergraph is unweighted.
This definition has the motivation that it recovers the Laplacian quadratic form in the case of graphs, and that if
is the indicator of a set
then
. Furthermore, one can define “eigenvalues” and “eigenvectors” of this hypergraph Laplacian and recover a Cheeger inequality and even higher-order Cheeger inequalities.
So it seems interesting to consider the following notion of sparsification: a hypergraph is a spectral sparsifier of
with error
if
and
have the same vertex set and
where, as before, the convention is that .
Soma and Yoshida studied this question and gave a construction with hyperedges. In the rest of this post we will discuss the construction by Nikhil Bansal, Ola Svensson and me, which uses
hyperedges.
2. Choosing Probabilities
Given a hypergraph , we can construct a multigraph
by taking each hyperedge
of
and then constructing, in
, a clique between the vertices of
. Thus, in
, the edge
is repeated as many times (possibly, zero times) as the number of hyperedges
of
that contain both
and
. Another way to think about it is that the Laplacian of
is given by
where the inner sum is over the unordered pairs
. This graph relates in several interesting ways with
. For example, if
is a subset of vertices and
is a hyperedge that is cut by
, then between
and
of the edges of
derived from
are cut by
. If
is not cut by
, then none of the edges derived from
is cut by
, so we have
and, with a bit more work, it is possible to prove similar bounds for the Laplacian quadratic forms
This means that if we sparsify , say with the Spielman-Srivastava construction, we obtain information about
, up to multiplicative error
. Now suppose that, as we sample edges of
to sparsify it in the Spielman-Srivastava way, we will also pick hyperedges of
(for example we pick
if at least one of its corresponding edges in
is picked), and we weigh them by the inverse of the probability of being selected. Then we may hope that if the Spielman-Srivastava sparsification of
is tuned to achieve error
, the hypergraph that we obtain will have error at most
. Indeed, this is roughly what happens, and we will be able to prove it by showing that the error in the hypergraph sparsification is dominated by the error in the “Gaussian version” of Spielman-Srivastava described in the previous post.
So we are going to assign to each hyperedge a probability
where the factor will be chosen later to get the construction to work and
is the effective resistance of the edge
of
.
In fact, as before, it will be helpful to have probabilities that are non-positive powers of two, so we will choose to be a power of two such that
We have
Another fact that will become useful later (it will save us a factor of the order of in the number of hyperedges in the construction) is that
3. A Discrete Random Process
Our construction of a hypergraph sparsifier will be to select each
independently with probability
and, if selected, weigh it by
. Our goal is to find an upper bound in probability, or even in expectation, on
Recalling that , we will study
Because if we can show that the above quantity is at most , then, for every
,
as desired.
If we define
then we are interested in the supremum in of
where is a random variable that is equal to
with probability
and is equal to
with probability
.
As before, we will do the construction in rounds. If the smallest probability is
, then we will have
rounds. We start with
and then, at round
, we take all hyperedges
such that
and, independently for each such hyperedge, we either delete it or we double its weight (with probability 1/2 in each case). The final hypergraph
is distributed like
. We have the processes
where the random variables are Rademacher, the weights
are either 0 or
, and
is the set of edges that are “active” in round
, that is, the set of hyperedges
such that
.
For each hypergraph , we will also consider its associated graph
, obtained by replacing each hyperedge with a clique. The Laplacian of
is
We have the following lemma
Lemma 1 For every outcome of the first
rounds such that
, there is a probability at least
over the randomness of the
-th round that
and that
This means that we can take and apply the above lemma inductively to argue that we have a high probability that
, and so we get a hypergraph spectral sparsifier with error
and
hyperedges.
To prove the Lemma we will define the Gaussian process
where the are Gaussian. Notice the two differences between
and
: we replaced the Rademacher choices with Gaussian choices, and we replaced
with
Furthermore, we are doing a random choice for each pair within each hyperedge instead of just one random choice per hyperedge.
Fact 2 The random processes
and
are
-dominated by
.
Proof: For every we have
and
To complete the proof, we argue that
To verify the above inequality, assume , otherwise the argument will be symmetric, and call
the maximizer for
and
the maximizer for
. Then
It remains to estimate the expected sup
and the diameter
where, recall, .
With the usual change of variable we have
where, as before, we use the notation ,
and
.
By matrix Chernoff bounds for Gaussian sums of matrices,
Recall that if is a hyperedge that is active at round
, then
and
, and we also have
so that we have
The last term of (1) is the spectral norm of , which is at most 2 by the assumption of the Lemma.
Collecting all the pieces, we have proved that
We also need to bound the diameter of , that is
under the usual change of basis, for every of length 1 we want to bound the square root of
where is either zero or
and we can continue our chain of inequalities with
where we used again the assumption of the Lemma .
To prove the last part of the lemma, we have to prove that with high probability
We see that
as noted before, each is either zero or
, so
and the final claim follows from matrix Chernoff bounds.
Now we can put everything together. Applying our lemma inductively, we can say that with high probability
and
provided that we choose at least at absolute constant times
.
In particular, we can choose to be an absolute constant times
and have
which is the same as
So, with high probability, is a spectral sparsifier of
with error
, and it has
hyperedges
4. Some Additional Thoughts
The dependence on is definitely not optimal, particularly because when
is order of
we know from the work of Soma and Yoshida that we can do better. One place where we seem to lose is that, although we know
we are only able to show that
is -dominated by
rather than, as we could have hoped, -dominated. The difficulty is that the bound
is sometimes tight. In order to do better, it seems necessary to do something a bit differently from what we do here.
The effective resistance of an edge in a graph
can be written as
with the convention that . So it seems reasonable that a good definition of effective resistance for an hyperedge
in a hypergraph
would be
One can argue that these “effective resistances” add up to , but perhaps they add up to
? If we sample according to those “effective resitances,” can we apply generic chaining directly to
without having to rely on a Gaussian process on matrices, for which we have Chernoff bounds?
Apologies if I missed this, but is it clear why the matrix Chernoff bound approach that works for regular matrices fails for hypergraphs? I see that at one point you use the fact that certain matrices in question are rank-1 in the case of regular graphs, and this would not be true for the case of hypergraphs, but maybe there’s something else also?
Matrix Chernoff bounds allow you to bound the spectral norm of a random matrix.
After a change of basis, the condition that a randomly subsampled graph sparsifies another becomes the question of bounding the spectral norm of a random matrix, to which matrix Chernoff bounds apply (this is Spielman-Srivastava).
The condition that a randomly subsampled hypergraph sparsifies another is the question of how large the supremum of a certain quantity can be, and that supremum is not the spectral norm of a matrix. Talagrand’s comparison inequality allows us to say that the supremum of that quantity is upper bounded by the spectral norm of a certain random matrix, provided that certain condition holds. We verify the conditions and then use matrix Chernoff bounds to upper the spectral norm of the random matrix.