ARV on Abelian Cayley Graphs

Continuing from the previous post, we are going to prove the following result: let {G} be a {d}-regular Cayley graph of an Abelian group, {\phi(G)} be the normalized edge expansion of {G}, {ARV(G)} be the value of the ARV semidefinite programming relaxation of sparsest cut on {G} (we will define it below), and {\lambda_2(G)} be the second smallest normalized Laplacian eigenvalue of {G}. Then we have

\displaystyle   \lambda_2 (G) \leq O(d) \cdot (ARV (G))^2 \ \ \ \ \ (1)

which, together with the fact that {ARV(G) \leq 2 \phi(G)} and {\phi(G) \leq \sqrt{2 \lambda_2}}, implies the Buser inequality

\displaystyle   \lambda_2 (G) \leq O(d) \cdot \phi^2 (G) \ \ \ \ \ (2)

and the approximation bound

\displaystyle   \phi(G) \leq O(\sqrt d) \cdot ARV(G) \ \ \ \ \ (3)

The proof of (1), due to Shayan Oveis Gharan and myself, is very similar to the proof by Bauer et al. of (2).

1. Ideas

For a positive integer parameter {t}, call {G^t} the multigraph whose adjacency matrix is {A^t}, where {A} is the adjacency matrix of {G}. So {G^t} is a {d^t}-regular graph, and each edge in {G^t} corresponds to a length-{t} walk in {G}. Our proof boils down to showing

\displaystyle   ARV(G^t) \leq O(\sqrt{dt}) \cdot ARV(G) \ \ \ \ \ (4)

which gives (1) after we note that

\displaystyle  \lambda_2 (G^t) = 1 - (1 - \lambda_2(G))^t

and

\displaystyle  ARV(G^t) \geq \lambda_2 (G^t)

and we combine the above inequalities with {t := 1/\lambda_2 (G)}:

\displaystyle  \Omega(1) \leq 1 - (1 - \lambda_2(G))^t = \lambda_2 (G^t) \leq ARV(G^t) \leq O(\sqrt{dt}) \cdot ARV(G)

The reader will see that our argument could also prove (roughly as done by Bauer et al.) that {\phi(G^t) \leq O(\sqrt d) \cdot \phi(G)}, simply by reasoning about distributions of cuts instead of reasoning about ARV solutions, which would give (2) more directly. By reasoning about ARV solutions, however, we are also able to establish (3), which we think is independently interesting.

It remains to prove (4). I will provide a completely self-contained proof, including the definition of Cayley graphs and of the ARV relaxation, but, first, here is a summary for the reader already familiar with this material: we take an optimal solution of ARV for {G}, and bound how well it does for {G^t}. We need to understand how much bigger is the fraction of edges cut by the solution in {G^t} compared to the fraction of edges cut in {G}; a random edge in {G^t} is obtained by randomly sampling {t} generators and adding them together, which is roughly like sampling {t/d} times each generator, each time with a random sign. Because of cancellations, we expect that the sum of these {t} random generators can be obtained by summing roughly {\sqrt{t/d}} copies of each of the {d} generators, or {\sqrt{td}} generators in total. So a random edge of {G^t} corresponds roughly to {\sqrt{td}} edges of {G}, and this is by how much, at most, the fraction of cut edges can grow.

2. Definitions

Now we present more details and definition. Recall that if {\Gamma} is a group, for which we use additive notation, and {S} is a set or multiset of group elements, then the Cayley graph {Cay(\Gamma,S)} is the graph that has a vertex for every group element and an edge {(x,x+s)} for every group element {x\in \Gamma} and every element {s\in S}. We restrict ourselves to undirected graphs, so we will always assume that if {s} is an element of {S}, then {-s} is also an element of {S} (with the same multiplicity, if {S} if a multiset). Note that the resulting undirected graph is {|S|}-regular.

Several families of graphs can be seen to be Cayley graphs, including cycles, cliques, balanced complete bipartite graphs, hypercubes, toruses, and so on. All the above examples are actually Cayley graphs of Abelian groups. Several interesting families of graphs, for example several families of expanders, are Cayley graphs of non-Abelian groups, but the result of this post will apply only to Abelian groups.

To define the ARV relaxation, let us take it slowly and start from the definition of the sparsest cut problem. The edge expansion problem {\phi(G)} is closely related to the sparsest cut problem, which can be defined as

\displaystyle  \sigma(G) := \min_{x \in \{0,1\}^V} \ \ \frac{x^T L_G x}{x^T L_K x }

where {L_G = I - A/d} is the normalized Laplacian of {G} and {L_K = I - J/n} is the normalized Laplacian of the clique on {n} vertices. We wrote it this way to emphasize the similarity with the computation of the second smallest normalized Laplacian eigenvalue, which can be written as

\displaystyle  \lambda_2 (G) = \min_{x \in R^V : \ x \perp {\bf 1}} \ \ \frac {x^T L_G x}{x^Tx } = \min_{x\in R^V} \frac {x^T L_G x}{x^T L_K x}

where the second equality is perhaps not obvious but is easily proved (all the solutions {x\perp {\bf 1}} have the same cost function on both sides, and the last expression is shift invariant, so there is no loss in optimizing over all {{\mathbb R}^V} or just in the space {\perp {\bf 1}}). We see that the computation of {\lambda_2} is just a relaxation of the sparsest cut problem, and so we have

\displaystyle  \lambda_2(G) \leq \sigma(G)

We can write the sparsest cut problem in a less algebraic version as

\displaystyle  \sigma(G) = \min_{S\subseteq V} \ \ \frac{ cut(S)}{\frac dn \cdot |V| \cdot |V-S| }

and recall that

\displaystyle  \phi(G) = \min_{S\subseteq V: |S| \leq \frac {|V|} 2} \ \ \frac{ cut(S)}{d |S|}

The cost function for {\sigma(G)} does not change if we switch {S} with {V-S}, so {\sigma(G)} could be defined equivalently as an optimization problem over subsets {S\subseteq V: |S| \leq |V|/2}, and at this point {\sigma(G)} and {\phi(G)} are the same problem except for an extra factor of {|V-S|/n} in the denominator of {\sigma(G)}. Such a factor is always between {1/2} and {1}, so we have:

\displaystyle  \phi(G) \leq \sigma(G)\leq 2 \phi(G)

(Note that all these definitions have given us a particularly convoluted proof that {\lambda_2(G) \leq 2 \phi(G)}.)

Yet another way to characterize {\lambda_2} is as

\displaystyle  \lambda_2 (G) = \min_{x\in R^V} \frac {x^T L_G x}{x^T L_K x} = \min_{X \succeq {\bf 0}} \frac{L_G \bullet X}{L_K\bullet X}

Where {A\bullet B = \sum_{i,j} A_{i,j} B_{i,j}} is the Frobenius inner product between matrices. This is also something that is not obvious but that it is not difficult to prove, the main point being that if we write a PSD matrix {X} as {\sum_i x_ix_i^T}, then

\displaystyle  \frac{L_G \bullet X}{L_K\bullet X} = \frac{\sum_i x_i^T L_G x_i}{\sum_i x_i^T L_Kx_i} \geq \min_i \frac{ x_i^T L_G x_i}{x_i^T L_Kx_i}

and so there is no loss in passing from an optimization over all PSD matrices versus all rank-1 PSD matrices.

We can rewrite {\lambda_2 \min_{X \succeq {\bf 0}} \frac{L_G \bullet X}{L_K\bullet X} } in terms of the Cholesky decomposition of {X} as

\displaystyle  \begin{array}{llr} \min & \frac {\sum_{(u,v)\in E} || x_u - x_v||^2 }{\frac dn \sum_{u < v} ||x_u - x_v ||^2 } \\ \mbox{s.t}\\ & x_v \in {\mathbb R}^m & \ \ \forall v\in V\\ & m\geq 1 \end{array}

where the correspondence between PSD matrices {X} and vectors {\{ x_v \}_{v\in V}} is that we have {X_{u,v} = \langle x_u,x_v \rangle} (that is, {\{ x_v \}_{v\in V}} is the Cholesky decomposition of {X} and {X} is the Gram matrix of the {\{ x_v \}_{v\in V}}). An integral solution of the sparsest cut problem corresponds to choosing rank {m=1} and solutions in which each 1-dimensional {x_v} is either 1 or 0, corresponding on whether {v} is in the set {S} or not. The ARV relaxation is

\displaystyle  \begin{array}{llr} \min & \frac {\sum_{(u,v)\in E} || x_u - x_v||^2 }{\frac dn \sum_{u < v} ||x_u - x_v ||^2 } \\ \mbox{s.t}\\ & ||x_u - x_v||^2 \leq ||x_u - x_z||^2 + ||x_z - x_v||^2 &\ \ \forall u,v,z \in V\\ & x_v \in {\mathbb R}^m & \forall v\in V\\ & m\geq 1 \end{array}

which is a relaxation of sparsest cut because the “triangle inequality” constraints that we introduced are satisfied by 1-dimensional 0/1 solutions. Thus we have

\displaystyle  \lambda_2 (G) \leq ARV(G) \leq \phi(G)

3. The Argument

Let us take any solution for ARV, and let us symmetrize it so that the symmetrized solution {x} satisfies

\displaystyle  || x_u - x_{u + s} ||^2 = || x_v - x_{v+s} ||^2 \ \ \forall s\in S \ \forall u,v\in V

That is, make sure that the contribution of each edge to the numerator of the cost function depends only on the generator that defines the edge, and not on the pair of endpoints.

It is easier to see that this symmetrization is possible if we view our solution as a PSD matrix {X}. In this case, for every group element {g}, we can let {X_g} be the solution (with the same cost) obtained by permuting rows and columns according to the mapping {v \rightarrow g+ v}; then we can consider the solution {\frac 1n \sum_{g\in V} X_g}, which satisfies the required symmetry condition.

Because of this condition, the cost function applied to {\{ x_v \}_{v\in V}} in {G} is

\displaystyle  \frac { \frac n2 \cdot \sum_{s\in S} || x_s - x_0 ||^2}{ \frac dn \sum_{u < v} ||x_u - x_v ||^2 }

and the cost function applied to {\{ x_v \}_{v\in V}} in {G^t} is

\displaystyle  \frac { \frac n2 \cdot \sum_{s_1,\ldots,s_t\in S^t} || x_{s_1 + \cdots + s_t} - x_0 ||^2}{ \frac {d^t} n \sum_{u < v} ||x_u - x_v ||^2 }

meaning that our goal is now simply to prove

\displaystyle  \frac 1{d^t} \sum_{(s_1,\ldots,s_t)\in S^t} || x_{s_1 + \cdots + s_k} - x_0 ||^2 \leq O(\sqrt {dt}) \cdot \frac 1d \sum_{s\in S} || x_s - x_0 ||^2

or, if we take a probabilistic view

\displaystyle  E_{(s_1,\ldots,s_t) \sim S^t} || x_{s_1 + \cdots + s_k} - x_0 ||^2 \leq O(\sqrt {dt}) \cdot \mathop{\mathbb E}_{s\sim S} || x_s - x_0 ||^2 \

If we let {c_s} be the number of times that generator {s} appears in the sum {s_1 + \ldots + s_t}, counting cancellations (so that, if {s} appears 4 times and {-s} appears 6 times we let {c_s = 0} and {c_{-s} = 2}) we have

\displaystyle  s_1 + \ldots + s_ t = \sum_{s\in S} c_s \cdot s

where multiplying an integer by a generator means adding the generator to itself that many times. Using the triangle inequality and the symmetrization we have

\displaystyle  || x_{s_1 + \ldots + s_t} - x_0 ||^2 \leq \sum_{s\in S} c_s \cdot || x_s - x_0 ||^2

The next observation is that, for every {s},

\displaystyle  \mathop{\mathbb E} c_s \leq O(\sqrt {t/d})

where the expectation is over the choice of {s_1,\ldots,s_t}. This is because {c_s} can be seen as {\min \{ 0 , X_1 + \ldots X_t \}}, where we define {X_i = +1} if {s_i = s}, {X_i = -1} if {s_i = -s} and {X_i = 0} otherwise. We have

\displaystyle  \mathop{\mathbb E} c_s = \mathop{\mathbb E} \min \{ 0 , X_1 + \ldots X_t \} \leq \mathop{\mathbb E} | X_1 + \ldots + X_t | \leq \sqrt {\mathop{\mathbb E} (X_1 + \ldots + X_t)^2} = \sqrt{2t/d}

Combining everything, we have

\displaystyle  E_{(s_1,\ldots,s_t) \sim S^t} || x_{s_1 + \cdots + s_k} - x_0 ||^2

\displaystyle  \leq \sum_{s\in S} \mathop{\mathbb E} c_s || x_s - x_0 ||^2

\displaystyle  \leq O\left( \sqrt {\frac td} \right) \cdot \sum_{s\in S} || x_s - x_0 ||^2

\displaystyle  = O(\sqrt{dt}) \mathop{\mathbb E}_{s\sim S} || x_s - x_0 ||^2

So every solution of ARV for {G} is also a solution for {G^t} with a cost that increases at most by a {O(\sqrt {dt})} factor, which proves (4) as promised.

4. A Couple of Interesting Questions

We showed that ARV has integrality gap at most {O(\sqrt d)} for every {d}-regular Abelian Cayley graph, but we did not demonstrate a rounding algorithm able to achieve an {O(\sqrt d)} approximation ratio.

If we follow the argument, starting from an ARV solution of cost {\epsilon}, choosing {t} of the order of {1/d\epsilon^2} we see that {G^t} has an ARV solution (the same as before) of cost at most, say, {1/2}, and so {\lambda_2(G^t) \leq 1/2}, implying that {\lambda_2(G) \leq O(1/t)} and so Fiedler’s algorithm according to a Laplacian eigenvalue finds a cut of sparsity at most {O(\sqrt{1/t}) = O(\sqrt d \cdot \epsilon )}.

We can also see that if {X} is the matrix corresponding to an ARV solution of value {\epsilon} for {G}, then one of the eigenvectors {x} of {X} must be a test vector of Rayleigh quotient at most {1/2} for the Laplacian {G^t}, where {t} is order of {1/d\epsilon^2}. However it is not clear how to get, from such a vector, a test vector for the Laplacian of {G} of Rayleigh quotient at most {O(\sqrt {1/t} )} for the Laplacian of {G}, though one such vector should be {A^k x} for a properly chosen {k} in the range {1,\ldots, t}.

If this actually work, then the following, or something like that, would be a rounding algorithm: given a PSD solution {X} of ARV of cost {\epsilon}, consider PSD matrices of the form {A^t X A^t}, which do not necessarily satisfy the triangle inequalities any more, for {t = 1,\ldots, 1/d\epsilon^2}, and try to round them using a random hyperplane. Would this make sense for other classes of graphs?

It is plausible that sparsest cut in Abelian Cayley graphs has actually a constant-factor approximation in polynomial or quasi-polynomial time, and maybe either that ARV achieves constant-factor approximation.

2 thoughts on “ARV on Abelian Cayley Graphs

  1. Hi Luca,

    Thanks for the post! Do you think hierarchies might help with removing the dependence on $\sqrt{d}$?

  2. Potentially, yes. At this point, even a sub-exponential time constant-factor approximation for Abelian Cayley graphs would be interesting, possibly by using hierarchies with n^{O(1)} rounds

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s