In which we dwell at great length on the “easy” part of Cheeger’s inequality.

The edge expansion of a graph G=(V,E), defined as

\displaystyle  h(G):= \min_{S: |S| \leq \frac {|V|}{2} } \frac{edges(S,V-S)}{|S|}

is a measure of how “well connected” is a graph. A graph has edge expansion at least h if and only if, in order to disconnect a set of k vertices from the rest of the graph, one must remove at least hk edges. In other words, the removal of t edges from a graph of edge expansion \geq h must leave a connected component of size \geq n - t/h, where n is the number of vertices.

As readers of in theory know, graphs of large edge expansion (expander graphs) have many applications, and in this post we are going to return to the question of how one certifies that a given graph has large edge expansion.

For simplicity, we shall only talk about regular graphs. In a previous post we mentioned that if A is the adjacency matrix of a graph, there are numbers \lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n (the eigenvalues of A), not necessarily distinct, and an orthonormal basis x_1,\cdots,x_n of {\mathbf R}^n, such that for all i

x_iA = \lambda_i x_i

And if the graph is d-regular, then \lambda_1=d, x_1 = \frac {1}{\sqrt n} (1,\cdots,1), and for all i we have |\lambda_i | \leq d. Finally, the numbers \lambda_i and the vectors x_i are computable in polynomial time. (This is all the linear algebra we will need.)

Given this setup, we have the lower bound

h(G) \geq \frac 12 \cdot (d-\lambda_2)  \ \ \   (1)

That is, the difference between largest and second largest eigenvalue gives a lower bound to the edge expansion of the graph.

Although (1) is very easy to prove, it is a remarkable result, in the sense that it gives a polynomial-time computable certificate of a “coNP” statement, namely, that for all cuts, at least a certain number of edges cross the cut. Since computing edge expansion is NP-complete, the inequality cannot always be tight. Eventually shall compare it with other methods to lower bound the edge expansion, and see that they are all special cases of the same general approach. To begin with, in this post we shall show that d-\lambda_2 is an efficiently computable relaxation of the edge expansion problem. (Or, rather, of the closely related sparsest cut problem.)

We define the sparsest cut of a graph as

\displaystyle  \phi(G):= \min_{S} \frac{edges(S,V-S)} {\frac {1}{n} \cdot |S| \cdot |V-S|}

Clearly we have

\phi \geq h \geq \frac 12 \cdot \phi

so we need to prove \phi \geq d-\lambda_2. We will do so by showing that d-\lambda_2 is a relaxation of \phi, that is, the computation of d-\lambda_2 can be seen as an optimization problem that is identical to \phi, except that we optimize over a bigger range of feasible solutions. To get started, we need to think about eigenvalues as solutions to optimization problems. This comes from characterizing eigenvalues via Rayleigh quotients. For our purposes (in which A is the adjacency matrix of a d-regular graph) we shall use the following simple identity

\displaystyle \lambda_2 = \max_{x: x \perp (1,\cdots 1)} \frac {x^TAx} {x^Tx}\ \ \ (2)

To prove the identity, take any vector x orthogonal to (1,\cdots,1) (and hence to x_1), and write it in the orthonormal basis x_1,\cdots,x_n as

x = \alpha_2 x_2 + \cdots + \alpha_n x_n

then the quotient in (2) is

\displaystyle \frac{\sum_{i=2}^n \lambda_i \alpha_i^2}{\sum_{i=2}^n \alpha_i^2} \leq \left( \max_{i=2,\ldots,n} \lambda_i \right) \cdot  \frac{\sum_{i=2}^n \alpha_i^2}{\sum_{i=2}^n \alpha_i^2} = \lambda_2

and the quotient is equal to \lambda_2 when x=x_2.

The next step is to consider, for a vector x\in {\mathbb R}^V, the quadratic form

\sum_{i,j} A(i,j) (x(i) - x(j))^2 \ \ \ (3)

which (up to normalization) measures the expected value of the quantity ((x(i)-x(j))^2 for a random edge (i,j), that is, how much the value of x changes along edges. Since A is the adjacency matrix of a d-regular graph, we have

\sum_{i,j} A(i,j) (x(i) - x(j))^2 = 2d x^Tx - 2 x^TAx \ \ \ (4)

and so, combined with equation (2), we have

\displaystyle d-\lambda_2 = \min_{x \perp (1,\cdots,1)} \frac {\sum_{i,j} A(i,j) (x(i) - x(j))^2}{2x^Tx} \ \ \ (5)

We are almost done with our manipulations. In the same spirit as Equation (4) we have

\displaystyle  \sum_{i,j}  (x(i) - x(j))^2 = 2n x^Tx - 2 x^TJx \ \ \ (6)

where J is the matrix with a 1 in every entry. If x \perp (1,\cdots,1), then
Jx is the all-zero vector, and so we have

\displaystyle d-\lambda_2 = \min_{x \perp (1,\cdots,1)} \frac {\sum_{i,j} A(i,j) (x(i) - x(j))^2}{\frac 1n \cdot \sum_{i,j} (x(i)-x(j))^2} \ \ \ (7)

Finally, we observe that the quotient in (7) does not change if one adds the same constant to each coordinate of x, and for every vector x there is a contant c (the average of the entries of x) such that x- (c,\cdots,c) is orthogonal to (1,\cdots,1). This means that the requirement to optimize over vectors orthogonal to (1,\cdots,1) can be removed from (7) with no loss of generality. Hence,

\displaystyle d-\lambda_2 = \min_{x \in {\mathbb R}^V} \frac {\sum_{i,j} A(i,j) (x(i) - x(j))^2}{\sum_{i,j} (x(i)-x(j))^2} \ \ \ (8)

Compare now Equation (8) to the following way to look at sparsest cut, which is clearly equivalent to the standard definition

\displaystyle \phi = \min_{x\in \{ 0,1\}^V} \frac{\sum_{i,j} A(i,j)  | x(i) - x(j) | }{\frac 1n  \sum_{i,j}  | x(i) - x(j) | } = \min_{x\in \{ 0,1\}^V} \frac{\sum_{i,j} A(i,j)  | x(i) - x(j) |^2 }{\frac 1n  \sum_{i,j}  | x(i) - x(j) |^2 }\ \ \ (9)

where the last equality comes from the fact that the absolute value of the difference of two boolean values is a boolean value, and hence equal to its square.

So we have found the reason why \phi \geq d-\lambda_2. Both \phi and d-\lambda_2 are the solution of a minimization problem, the cost function is the same in the two problems, but in the case of d-\lambda_2 we are allowed to optimize over all of {\mathbb R}^V, while in the case of \phi we restrict ourselves to \{0,1\}^V.

In computer science, we very often see cases of intractable minimization combinatorial problems that can be relaxed to continuous optimization problems which can then be optimally solved, and the optimal solution of the relaxation gives a lower bound to the minimum of the combinatorial problem. When our continuous relaxation is a linear program, we have a theory of duality that gives us a maximization continuous dual problem such that the cost of any solution to the dual problem gives a lower bound to the optimum of the primal linear problem, and hence a lower bound to the minimum of the combinatorial problem.

Although it is not clear from the formulations given so far, it is possible to cast the computation of d-\lambda_2 as the optimum of a semidefinite programming relaxation of sparsest cut; such a relaxation admits a semidefinite dual, and so any solution that is feasible for that semidefinite dual provides a lower bound to edge expansion. (And this fact can be checked very simply, with no reference to eigenvalues and eigenvectors, although there is in general no easier way to construct semidefinite dual feasible solutions than to use a system of eigenvectors.) This will be the subject of the next post.

About these ads