In which we analyze a nearly-linear time algorithm for finding an approximate eigenvector for the second eigenvalue of a graph adjacency matrix, to be used in the spectral partitioning algorithm.

In past lectures, we showed that, if {G=(V,E)} is a {d}-regular graph, and {M} is its normalized adjacency matrix with eigenvalues {1=\lambda_1 \geq \lambda_2 \ldots \geq \lambda_n}, given an eigenvector of {\lambda_2}, the algorithm SpectralPartition finds, in nearly-linear time {O(|E| + |V|\log |V|)}, a cut {(S,V-S)} such that {h(S) \leq 2\sqrt{h(G)}}.

More generally, if, instead of being given an eigenvector {{\bf x}} such that {M{\bf x} = \lambda_2 {\bf x}}, we are given a vector {{\bf x} \perp {\bf 1}} such that {{\bf x}^T M {\bf x} \geq (\lambda_2 - \epsilon) {\bf x}^T{\bf x}}, then the algorithm finds a cut such that {h(S) \leq \sqrt{4h(G) + 2\epsilon}}. In this lecture we describe and analyze an algorithm that computes such a vector using {O((|V|+|E|)\cdot \frac 1\epsilon \cdot \log \frac {|V|}{\epsilon})} arithmetic operations.

A symmetric matrix is positive semi-definite (abbreviated PSD) if all its eigenvalues are nonnegative. We begin by describing an algorithm that approximates the largest eigenvalue of a given symmetric PSD matrix. This might not seem to help very much because the adjacency matrix of a graph is not PSD, and because we want to compute the second largest, not the largest, eigenvalue. We will see, however, that the algorithm is easily modified to approximate the second eigenvalue of a PSD matrix (if an eigenvector of the first eigenvalue is known), and that the adjacency matrix of a graph can easily be modified to be PSD.

1. The Power Method to Approximate the Largest Eigenvalue

The algorithm works as follows

Algorithm Power

  • Input: PSD symmetric matrix {M\in {\mathbb R}^{n\times n}}, positive integer {t}
  • Pick uniformly at random {{\bf x}_0 \sim \{-1,1\}^n}
  • for {i:=1} to {t}

    • {{\bf x}_i := M{\bf x}_{i-1}}
  • return {{\bf x}_t}

That is, the algorithm simply picks uniformly at random a vector {{\bf x}} with {\pm 1} coordinates, and outputs {M^t{\bf x}}.

Note that the algorithm performs {O(t\cdot (n+m))} arithmetic operations, where {m} is the number of non-zero entries of the matrix {M}.

Theorem 1 For every PSD matrix {M}, positive integer {t} and parameter {\epsilon>0}, with probability {\geq 3/16} over the choice of {{\bf x}_0}, the algorithm Power outputs a vector {{\bf x}_t} such that

\displaystyle  \frac{ {\bf x}_t^T M {\bf x}_t }{ {\bf x}_t^T{\bf x}_t}\geq \lambda_1 \cdot (1-\epsilon) \cdot \frac 1 {1+4n (1-\epsilon)^{2t}}

where {\lambda_1} is the largest eigenvalue of {M}.

Note that, in particular, we can have {t=O(\log n/\epsilon)} and {\frac{{\bf x}_t^TM{\bf x}_t}{{\bf x}_t^T{\bf x}_t} \geq (1-O(\epsilon)) \cdot \lambda_1}.

Let {\lambda_1 \geq \cdots \lambda_n} be the eigenvalues of {M}, with multiplicities, and {{\bf v}_1,\ldots,{\bf v}_n} be a system of orthonormal eigenvectors such that {M{\bf v}_i = \lambda_i {\bf v}_i}. Theorem 1 is implied by the following two lemmas

Lemma 2 Let {{\bf v} \in {\mathbb R}^n} be a vector such that {||{\bf v}||=1}. Sample uniformly {{\bf x} \sim \{-1,1\}^n}. Then

\displaystyle  \mathop{\mathbb P} \left[ |\langle {\bf x},{\bf v}\rangle| \geq \frac 12 \right] \geq \frac 3{16}

Lemma 3 Let {{\bf x} \in {\mathbb R}^n} be a vector such that {|\langle {\bf x} , {\bf v}_1\rangle| \geq \frac 12}. Then, for every positive integer {t} and positive {\epsilon >0}, if we define {{\bf y} := M^t {\bf x}}, we have

\displaystyle  \frac{{\bf y}^T M {\bf y}}{{\bf y}^T{\bf y}} \geq \lambda_1 \cdot (1-\epsilon) \cdot \frac 1 {1+4 ||{\bf x}||^2 (1-\epsilon)^{2t}}

It remains to prove the two lemmas.

Proof: (Of Lemma 2) Let {{\bf v} = (v_1,\ldots,v_n)}. The inner product {\langle {\bf x}, {\bf v}\rangle} is the random variable

\displaystyle  S:= \sum_i x_i v_i

Let us compute the first, second, and fourth moment of {S}.

\displaystyle  \mathop{\mathbb E} S = 0

\displaystyle  \mathop{\mathbb E} S^2 = \sum_i v_i^2 = 1

\displaystyle  \mathop{\mathbb E} S^4 = 3 \left(\sum_i v_i^2 \right) - 2 \sum_i v_i^4 \leq 3

Recall that the Paley-Zygmund inequality states that if {Z} is a non-negative random variable with finite variance, then, for every {0\leq \delta \leq 1}, we have

\displaystyle  \mathop{\mathbb P} [ Z \geq \delta \mathop{\mathbb E} Z ] \geq (1-\delta)^2 \cdot \frac {(\mathop{\mathbb E} Z)^2}{\mathop{\mathbb E} Z^2} \ \ \ \ \ (1)

which follows by noting that

\displaystyle  \mathop{\mathbb E} Z = \mathop{\mathbb E} [ Z\cdot 1_{Z < \delta \mathop{\mathbb E} Z} ] + \mathop{\mathbb E} [ Z \cdot 1_{Z \geq \delta \mathop{\mathbb E} Z} \ ,

that

\displaystyle  \mathop{\mathbb E} [ Z\cdot 1_{Z < \delta \mathop{\mathbb E} Z} ] \leq \delta \mathop{\mathbb E} Z \ ,

and that

\displaystyle  \mathop{\mathbb E} [ Z \cdot 1_{Z \geq \delta \mathop{\mathbb E} Z}] \leq \sqrt {\mathop{\mathbb E} Z^2}\cdot \sqrt{\mathop{\mathbb E} 1_{Z \geq \delta \mathop{\mathbb E} Z}}

\displaystyle  = \sqrt {\mathop{\mathbb E} Z^2}\sqrt{\mathop{\mathbb P} [ Z \geq \delta \mathop{\mathbb E} Z] }

We apply the Paley-Zygmund inequality to the case {Z= S^2} and {\delta = 1/4}, and we derive

\displaystyle  \mathop{\mathbb P} \left[ S^2 \geq \frac 14 \right] \geq \left( \frac 34 \right)^2 \cdot \frac 13 = \frac 3{16}

\Box

Remark 1 The proof of Lemma 2 works even if {{\bf x} \sim \{-1,1\}^n} is selected according to a 4-wise independent distribution. This means that the algorithm can be derandomized in polynomial time.

Proof: (Of Lemma 3) Let us write {{\bf x}} as a linear combination of the eigenvectors

\displaystyle  {\bf x} = a_1 {\bf v}_1 + \cdots + a_n {\bf v}_n

where the coefficients can be computed as {a_i = \langle {\bf x}, {\bf v}_i\rangle}. Note that, by assumption, {|a_1| \geq .5}, and that, by orthonormality of the eigenvectors, {||{\bf x}||^2 = \sum_i a_i^2}.

We have

\displaystyle  {\bf y} = a_1\lambda_1^t {\bf v}_1 + \cdots + a_n \lambda_n^t{\bf v}_n

and so

\displaystyle  {\bf y}^T M {\bf y} = \sum_i a_i^2 \lambda_i^{2t+1}

and

\displaystyle  {\bf y}^T{\bf y} = \sum_i a_i^2 \lambda_i^{2t}

We need to prove a lower bound to the ratio of the above two quantities. We will compute a lower bound to the numerator and an upper bound to the denominator in terms of the same parameter.

Let {k} be the number of eigenvalues larger than {\lambda_1 \cdot (1-\epsilon)}. Then, recalling that the eigenvalues are sorted in non-increasing order, we have

\displaystyle  {\bf y}^TM{\bf y} \geq \sum_{i=1}^k a_i^2 \lambda_i^{2t+1} \geq \lambda_1(1-\epsilon) \sum_{i=1}^k a_i^2 \lambda_i^{2t}

We also see that

\displaystyle  \sum_{i=k+1}^n a_i^2\lambda_i^{2t}

\displaystyle  \leq \lambda_1^{2t} \cdot (1-\epsilon)^{2t} \sum_{i=k+1}^n a_i^2

\displaystyle  \leq \lambda_1^{2t} \cdot (1-\epsilon)^{2t} \cdot ||{\bf x}||^2

\displaystyle  \leq 4 a_1^2 \lambda_1^{2t} (1-\epsilon)^{2t} ||{\bf x}||^2

\displaystyle  \leq 4 ||{\bf x}||^2 (1-\epsilon)^{2t} \sum_{i=1}^k a_i^2 \lambda_i^{2t}

So we have

\displaystyle  {\bf y}^T{\bf y} \leq (1 +4 ||{\bf x}||^2 (1-\epsilon)^{2t}) \cdot \sum_{i=1}^k a_i^2

giving

\displaystyle  \frac{{\bf y}^T M {\bf y}}{{\bf y}^T{\bf y}} \geq \lambda_1 \cdot (1-\epsilon) \cdot \frac 1 {1+4 ||{\bf x}||^2 (1-\epsilon)^{2t}}

\Box

Remark 2 Where did we use the assumption that {M} is positive semidefinite? What happens if we apply this algorithm to the adjacency matrix of a bipartite graph?

2. Approximating the Second Eigenvalue

If {M} is a PSD matrix, and if we know a unit-length eigenvector {{\bf v}_1} of the largest eigenvalue of {M}, we can approximately find the second eigenvalue with the following adaption of the algorithm from the previous section.

Algorithm Power2

  • Input: PSD symmetric matrix {M\in {\mathbb R}^{n\times n}}, positive integer {t}, vector {{\bf v}_1}
  • Pick uniformly at random {{\bf x} \sim \{-1,1\}^n}
  • {{\bf x}_0:= {\bf x} - \langle {\bf v}_1,{\bf x} \rangle\cdot {\bf v}_1}
  • for {i:=1} to {t}

    • {{\bf x}_i := M{\bf x}_{i-1}}
  • return {{\bf x}_t}

If {{\bf v}_1,\ldots,{\bf v}_n} is an orthonormal basis of eigenvectors for the eigenvalues {\lambda_1 \geq \cdots \geq \lambda_n} of {M}, then, at the beginning, we pick a random vector

\displaystyle  {\bf x} = a_1 {\bf v}_1 + a_2 {\bf v}_2 + \cdots a_n {\bf v}_n

that, with probability at least {3/16}, satisfies {|a_2| \geq 1/2}. (Cf. Lemma 2.) Then we compute {{\bf x}_0}, which is the projection of {{\bf x}} on the subspace orthogonal to {{\bf v}_1}, that is

\displaystyle  {\bf x}_0 = a_2 {\bf v}_2 + \cdots a_n {\bf v}_n

Note that {||{\bf x}||^2 = n} and that {||{\bf x}_0||^2 \leq n}.

The output is the vector {{\bf x}_t}

\displaystyle  {\bf x}_t = a_2 \lambda_2^t {\bf v}^2 + \cdots a_n \lambda_n ^t{\bf v}^n

If we apply Lemma 3 to subspace orthogonal to {{\bf v}_1}, we see that when {|a_2|\geq 1/2} we have that, for every {0< \epsilon< 1},

\displaystyle  \frac{ {\bf x}_t^T M {\bf x}_t}{{\bf x}_t^T{\bf x}_t} \geq \lambda_2 \cdot (1-\epsilon) \cdot \frac 1{4 n (1-\epsilon)^{2t}}

We have thus established the following analysis.

Theorem 4 For every PSD matrix {M}, positive integer {t} and parameter {\epsilon>0}, if {{\bf v}_1} is a length-1 eigenvector of the largest eigenvalue of {M}, then with probability {\geq 3/16} over the choice of {{\bf x}_0}, the algorithm Power2 outputs a vector {{\bf x}_t \perp {\bf v}_1} such that

\displaystyle  \frac {{\bf x}_t^T M {\bf x}_t}{ {\bf x}_t^T{\bf x}_t} \geq \lambda_2 \cdot (1-\epsilon) \cdot \frac 1 {1+4n (1-\epsilon)^{2t}}

where {\lambda_2} is the second largest eigenvalue of {M}, counting multiplicities.

Finally, we come to the case in which {M} is the normalized adjacency matrix of a regular graph.

We know that {M} has eigenvalues {1=\lambda_1 \geq \cdots \lambda_n\geq -1} and that {\frac 1{\sqrt n} \cdot {\bf 1}} is an eigenvector of {\lambda_1}.

Consider now the matrix {M+I}. Every eigenvector of {M} with eigenvalue {\lambda} is clearly also an eigenvector of {M+I} with eigenvalue {1+\lambda}, and vice versa, thus {M+I} has eigenvalues {2=1+\lambda_1 \geq 1+ \lambda_2 \geq \cdots \geq 1+\lambda_n \geq 0} and it is PSD.

This means that we can run algorithm Power2 on the matrix {I+M} using the vector {{\bf v}_1 = \frac 1 {\sqrt n} {\bf 1}} and a parameter {t = O(\epsilon^{-1} \log n/\epsilon))}. The algorithm finds, with probability {\geq 3/16}, a vector {{\bf x}_t \perp {\bf 1}} such that

\displaystyle  \frac{{\bf x}_t^T (M+I) {\bf x}_t}{{\bf x}_t^T{\bf x}_t} \geq (1+\lambda_2)\cdot (1-2\epsilon)

which is equivalent to

\displaystyle  \frac{{\bf x}_t^T M {\bf x}_t}{{\bf x}_t^T{\bf x}_t} \geq \lambda_2 - 2\epsilon -2\epsilon\lambda_2 \geq \lambda_2 - 4\epsilon

About these ads