# CS294 Lecture 8: Spectral Algorithms Wrap-up

In which we talk about even more generalizations of Cheeger’s inequalities, and we analyze the power method to find approximate eigenvectors, thus having a complete description of a polynomial-time approximation algorithm for sparsest cut

1. Irregular Graphs

For simplicity, we proved our results on ${\lambda_2}$ and ${\lambda_k}$ for regular graphs. Those results extend, essentially with the same proofs, to the case of irregular undirected graphs. In an irregular graph ${G=(V,E)}$, the notion that generalizes edge expansion is called conductance. If ${d_v}$ is the degree of vertex ${v}$, then the conductance of set ${S}$ of vertices is

$\displaystyle \phi(S) := \frac {E(S, V-S)}{\sum_{v\in S} d_v }$

We will call the sum of the degrees of a set of vertices the volume of the set, and denote it ${vol(S) := \sum_{v\in S} d_v}$. The conductance of the graph ${G}$ is

$\displaystyle \phi(G) := \min_{S: vol(S) \leq \frac 12 vol(V)} \ \ \phi(S)$

Higher-order conductance is defined as higher-order expansion, but with conductance replacing expansion in the definition.

The Cheeger inequalities

$\displaystyle \frac {\lambda_2}{2} \leq \phi(G) \leq \sqrt {2 \lambda_2}$

still hold, with the same proof. With some abuse of notation, we will call the following quantity the Rayleigh quotient of ${{\bf x}}$

$\displaystyle R_L({\bf x}) := \frac { \sum_{ \{ u,v \} \in E} (x_u - x_v)^2 } { \sum_{v\in V} d_v x_v^2}$

even if, technically, it is the Rayleigh quotient of ${D^{1/2} {\bf x}}$, where ${D}$ is the diagonal matrix of degrees.

We can also adapt the proof of the higher-order Cheeger inequality to show

$\displaystyle \frac{\lambda_k}{2} \leq \phi_k (G) \leq O(k^{3.5}) \cdot \sqrt{ \lambda_k }$

2. More Cheeger-type Bounds

We proved that if ${(S_F, V-S_F)}$ is the cut found by Fiedler’s algorithm using the eigenvector of ${\lambda_2}$, then

$\displaystyle \phi(S_F, V-S_F) \leq 2 \sqrt {\phi (G)}$

which is a good bound, although it usually underestimates the quality of the solutions found in practice. (There are graphs, however, for which the above inequality is tight within a constant factor.)

One case in which we can improve the analysis is when there are not too many eigenvalues close to ${\lambda_2}$

Theorem 1 There is a constant ${c}$ such that, if ${(S_F, V-S_F)}$ is the cut obtained by Fiedler’s algorithm using an eigenvector for ${\lambda_2}$, then, for every ${k\geq 2}$,

$\displaystyle \phi (S_F,V-S_F) \leq c \cdot k \cdot \frac{\lambda_2}{\sqrt {\lambda_k}}$

So we have

$\displaystyle \phi (S_F,V-S_F) \leq 2 c \cdot \phi(G) \cdot \min_{k \geq 2} \frac {k}{\sqrt {\lambda_k}}$

which is a better bound for families of graphs in which, for some ${k}$, ${\lambda_k >> k^2 \lambda_2}$.

We will not have time to prove Theorem 1, but we will state the two main pieces of its proof.

Lemma 2 Let ${{\bf x} \in {\mathbb R}^V_{\geq 0}}$ be a non-negative vector. Then, for every ${k}$, there is a non-negative vector ${{\bf y} \in {\mathbb R}^V_{\geq 0}}$ whose entries take at most ${2k}$ distinct values and such that

$\displaystyle || {\bf x} - {\bf y} ||^2 \leq \frac{R_L({\bf x})}{\lambda_k } ||{\bf x}||^2$

That is, if ${R_L({\bf x}) >> \lambda_k}$, then there are ${2k}$ values such that most entries of ${{\bf x}}$ are close to one of those ${2k}$ values.

Lemma 3 There is a constant ${c'}$ such that, for every non-negative vectors ${{\bf x} \in {\mathbb R}^V_{\geq 0}}$ and ${{\bf y} \in {\mathbb R}^V_{\geq 0}}$, if ${{\bf y}}$ is such that its entris contain only ${k}$ distinct values, then there is a threshold ${t>0}$ such that

$\displaystyle \phi( \{ v : x_v \geq t \} ) \leq c' \cdot k \cdot \left(R_L({\bf x}) + \sqrt{R_L({\bf x})} \cdot \frac{||{\bf x} - {\bf y} ||}{||{\bf x}||} \right)$

The above lemma should be compared to the fact, which was a major piece in the proof of Cheeger’s inequality, that if ${{\bf x} \in {\mathbb R}^V_{\geq 0}}$ is an arbitrary non-negative vector, then there is a threshold ${t>0}$ such that

$\displaystyle \phi( \{ v : x_v \geq t \} ) \leq \sqrt{ 2 R_L({\bf x})}$

One obtains Theorem 1 in the following. Start from an eigenvector ${{\bf x}}$ of ${\lambda_2}$ and, using the first step in the proof of Cheeger’s inequality, obtain a vector ${{\bf x}' \in {\mathbb R}^V_{\geq 0}}$ with non-negative entries such that ${R_L({\bf x}' ) \leq R_L({\bf x}) = \lambda_2}$ and such that the support of ${{\bf x}}$ contains at most ${|V|/2}$ vertices.

Use Lemma 2 to find a vector ${{\bf y}}$ with non-negative entries and with at most ${2k}$ distinct values among its entries such that ${||{\bf x}' - {\bf y} ||^2 \leq \frac {\lambda_2}{\lambda_k} ||{\bf x}'||^2}$. Then use Lemma 3 and the fact that ${\lambda_k \leq 2}$ to conclude that there exists at ${t>0}$ such that

$\displaystyle \phi( \{ v : x'_v \geq t \} ) \leq O(k) \cdot \frac{\lambda_2}{\sqrt { \lambda_k} }$

The set ${\{ v: x'_v \geq t \}}$ contains at most ${|V|/2}$ vertices, it is one of the cuts considered by Fiedler’s algorithm on input ${{\bf x}}$.

Another property of graphs in which ${\lambda_k}$ is large for small ${k}$ is that they contain large expanders as induced subgraphs.

Theorem 4 There is a constant ${c}$ such that, for every graph ${G}$ and every ${k}$, there exists a partition of the vertices into ${\ell \leq k}$ sets ${(S_1\ldots,S_\ell)}$ such that, if we call ${G_i}$ the subgraph induced by the vertex set ${S_i}$, we have

$\displaystyle \phi_{G_i} \geq c \frac {\lambda_k }{ k^2}$

Theorem 5 If ${\phi_{k+1} > (1+\epsilon) \phi_k}$, then there is a partition of the vertices into ${k}$ subsets ${(S_1,\ldots,S_k}$ such that

$\displaystyle \forall i \in \{ 1,\ldots,k\} : \ \ \ \phi_{G_i} \geq \Omega\left( \frac \epsilon k \right ) \cdot \phi_{k+1} , \ \ \ \phi(S_i) \leq k \phi_k$

3. The Power Method

Earlier in this class, we showed that, if ${G=(V,E)}$ is a ${d}$-regular graph, and ${L}$ is its normalized Laplacian matrix with eigenvalues ${0=\lambda_1 \leq \lambda_2 \ldots \leq \lambda_n}$, given an eigenvector of ${\lambda_2}$, Fiedler’s algorithm finds, in nearly-linear time ${O(|E| + |V|\log |V|)}$, a cut ${(S,V-S)}$ such that ${\phi(S) \leq 2\sqrt{\phi(G)}}$.

More generally, if, instead of being given an eigenvector ${{\bf x}}$ such that ${L{\bf x} = \lambda_2 {\bf x}}$, we are given a vector ${{\bf x} \perp {\bf 1}}$ such that ${{\bf x}^T L {\bf x} \leq (\lambda_2 + \epsilon) {\bf x}^T{\bf x}}$, then the algorithm finds a cut such that ${\phi(S) \leq \sqrt{4\phi(G) + 2\epsilon}}$. We will now see how to compute such a vector using ${O((|V|+|E|)\cdot \frac 1\epsilon \cdot \log \frac {|V|}{\epsilon})}$ arithmetic operations.

A symmetric matrix is positive semi-definite (abbreviated PSD) if all its eigenvalues are nonnegative. We begin by describing an algorithm that approximates the largest eigenvalue of a given symmetric PSD matrix. This might not seem to help very much because because we want to compute the second smallest, not the largest, eigenvalue. We will see, however, that the algorithm is easily modified to accomplish what we want.

3.1. The Power Method to Approximate the Largest Eigenvalue

The algorithm works as follows

Algorithm Power

Input: PSD matrix ${M}$, parameter ${k}$

• Pick uniformly at random ${{\bf x}_0 \sim \{-1,1\}^n}$
• for ${i:=1}$ to ${k}$
${\mbox{} \hspace{30pt} {\bf x}_i := M\cdot {\bf x}_{i-1}}$
• return ${{\bf x}_k}$

That is, the algorithm simply picks uniformly at random a vector ${{\bf x}}$ with ${\pm 1}$ coordinates, and outputs ${M^k{\bf x}}$.

Note that the algorithm performs ${O(k\cdot (n+m))}$ arithmetic operations, where ${m}$ is the number of non-zero entries of the matrix ${M}$.

Theorem 6 For every PSD matrix ${M}$, positive integer ${k}$ and parameter ${\epsilon>0}$, with probability ${\geq 3/16}$ over the choice of ${{\bf x}_0}$, the algorithm Power outputs a vector ${{\bf x}_k}$ such that

$\displaystyle \frac{ {\bf x}_k^T M {\bf x}_k }{ {\bf x}_k^T{\bf x}_k}\geq \lambda_1 \cdot (1-\epsilon) \cdot \frac 1 {1+4n (1-\epsilon)^{2k}}$

where ${\lambda_1}$ is the largest eigenvalue of ${M}$.

Note that, in particular, we can have ${k=O(\log n/\epsilon)}$ and ${\frac{{\bf x}_k^TM{\bf x}_k}{{\bf x}_k^T{\bf x}_k} \geq (1-O(\epsilon)) \cdot \lambda_1}$.

Let ${\lambda_1 \geq \cdots \lambda_n}$ be the eigenvalues of ${M}$, with multiplicities, and ${{\bf v}_1,\ldots,{\bf v}_n}$ be a system of orthonormal eigenvectors such that ${M{\bf v}_i = \lambda_i {\bf v}_i}$. Theorem 6 is implied by the following two lemmas

Lemma 7 Let ${{\bf v} \in {\mathbb R}^n}$ be a vector such that ${||{\bf v}||=1}$. Sample uniformly ${{\bf x} \sim \{-1,1\}^n}$. Then

$\displaystyle \mathop{\mathbb P} \left[ |\langle {\bf x},{\bf v}\rangle| \geq \frac 12 \right] \geq \frac 3{16}$

Lemma 8 For every ${{\bf x} \in {\mathbb R}^n}$, for every positive integer ${k}$ and positive ${\epsilon >0}$, if we define ${{\bf y} := M^k {\bf x}}$, we have

$\displaystyle \frac{{\bf y}^T M {\bf y}}{{\bf y}^T{\bf y}} \geq \lambda_1 \cdot (1-\epsilon) \cdot \left( 1 + \frac {||{\bf x}||^2}{\langle {\bf x}, {\bf v}_1\rangle^2} (1-\epsilon)^{2k}\right ) ^{-1}$

It remains to prove the two lemmas.

Proof: (Of Lemma 7) Let ${{\bf v} = (v_1,\ldots,v_n)}$. The inner product ${\langle {\bf x}, {\bf v}\rangle}$ is the random variable

$\displaystyle S:= \sum_i x_i v_i$

Let us compute the first, second, and fourth moment of ${S}$.

$\displaystyle \mathop{\mathbb E} S = 0$

$\displaystyle \mathop{\mathbb E} S^2 = \sum_i v_i^2 = 1$

$\displaystyle \mathop{\mathbb E} S^4 = 3 \left(\sum_i v_i^2 \right) - 2 \sum_i v_i^4 \leq 3$

Recall that the Paley-Zygmund inequality states that if ${Z}$ is a non-negative random variable with finite variance, then, for every ${0\leq \delta \leq 1}$, we have

$\displaystyle \mathop{\mathbb P} [ Z \geq \delta \mathop{\mathbb E} Z ] \geq (1-\delta)^2 \cdot \frac {(\mathop{\mathbb E} Z)^2}{\mathop{\mathbb E} Z^2} \ \ \ \ \ (1)$

which follows by noting that

$\displaystyle \mathop{\mathbb E} Z = \mathop{\mathbb E} [ Z\cdot 1_{Z < \delta \mathop{\mathbb E} Z} ] + \mathop{\mathbb E} [ Z \cdot 1_{Z \geq \delta \mathop{\mathbb E} Z}] \ ,$

that

$\displaystyle \mathop{\mathbb E} [ Z\cdot 1_{Z < \delta \mathop{\mathbb E} Z} ] \leq \delta \mathop{\mathbb E} Z \ ,$

and that

$\displaystyle \mathop{\mathbb E} [ Z \cdot 1_{Z \geq \delta \mathop{\mathbb E} Z}] \leq \sqrt {\mathop{\mathbb E} Z^2}\cdot \sqrt{\mathop{\mathbb E} 1_{Z \geq \delta \mathop{\mathbb E} Z}}$

$\displaystyle = \sqrt {\mathop{\mathbb E} Z^2}\sqrt{\mathop{\mathbb P} [ Z \geq \delta \mathop{\mathbb E} Z] }$

We apply the Paley-Zygmund inequality to the case ${Z= S^2}$ and ${\delta = 1/4}$, and we derive

$\displaystyle \mathop{\mathbb P} \left[ S^2 \geq \frac 14 \right] \geq \left( \frac 34 \right)^2 \cdot \frac 13 = \frac 3{16}$

$\Box$

Remark 1 The proof of Lemma 7 works even if ${{\bf x} \sim \{-1,1\}^n}$ is selected according to a 4-wise independent distribution. This means that the algorithm can be derandomized in polynomial time.

Proof: (Of Lemma 8) Let us write ${{\bf x}}$ as a linear combination of the eigenvectors

$\displaystyle {\bf x} = a_1 {\bf v}_1 + \cdots + a_n {\bf v}_n$

where the coefficients can be computed as ${a_i = \langle {\bf x}, {\bf v}_i\rangle}$. We have

$\displaystyle {\bf y} = a_1\lambda_1^k {\bf v}_1 + \cdots + a_n \lambda_n^k{\bf v}_n$

and so

$\displaystyle {\bf y}^T M {\bf y} = \sum_i a_i^2 \lambda_i^{2k+1}$

and

$\displaystyle {\bf y}^T{\bf y} = \sum_i a_i^2 \lambda_i^{2k}$

We need to prove a lower bound to the ratio of the above two quantities. We will compute a lower bound to the numerator and an upper bound to the denominator in terms of the same parameter.

Let ${\ell}$ be the number of eigenvalues larger than ${\lambda_1 \cdot (1-\epsilon)}$. Then, recalling that the eigenvalues are sorted in non-increasing order, we have

$\displaystyle {\bf y}^TM{\bf y} \geq \sum_{i=1}^\ell a_i^2 \lambda_i^{2k+1} \geq \lambda_1(1-\epsilon) \sum_{i=1}^\ell a_i^2 \lambda_i^{2k}$

We also see that

$\displaystyle \sum_{i=\ell+1}^n a_i^2\lambda_i^{2k}$

$\displaystyle \leq \lambda_1^{2k} \cdot (1-\epsilon)^{2k} \sum_{i=\ell+1}^n a_i^2$

$\displaystyle \leq \lambda_1^{2k} \cdot (1-\epsilon)^{2k} \cdot ||{\bf x}||^2$

$\displaystyle \leq a_1^2 \lambda_1^{2k} (1-\epsilon)^{2t} \cdot \frac {||{\bf x}||^2}{a_1^2}$

$\displaystyle \leq \frac { ||{\bf x}||^2}{a_1^2 } (1-\epsilon)^{2k} \sum_{i=1}^\ell a_i^2 \lambda_i^{2k}$

So we have

$\displaystyle {\bf y}^T{\bf y} \leq \left(1 +\frac { ||{\bf x}||^2}{a_1^2} (1-\epsilon)^{2k} \right) \cdot \sum_{i=1}^\ell a_i^2$

giving

$\displaystyle \frac{{\bf y}^T M {\bf y}}{{\bf y}^T{\bf y}} \geq \lambda_1 \cdot (1-\epsilon) \cdot \left( 1+\frac { ||{\bf x}||^2}{a_1^2} (1-\epsilon)^{2k} \right)^{-1}$

$\Box$

Remark 2 Where did we use the assumption that ${M}$ is positive semidefinite? What happens if we apply this algorithm to the adjacency matrix of a bipartite graph?

3.2. Approximating the Second Largest Eigenvalue

Suppose now that we are interested in finding the second largest eigenvalue of a given PSD matrix ${M}$. If ${M}$ has eigenvalues ${\lambda_1 \geq \lambda_2 \geq \cdots \lambda_n}$, and we know the eigenvector ${{\bf v}_1}$ of ${\lambda_2}$, then ${M}$ is a PSD linear map from the orthogonal space to ${{\bf v}_1}$ to itself, and ${\lambda_2}$ is the largest eigenvalue of this linear map. We can then run the previous algorithm on this linear map.

Algorithm Power2

Input: PSD matrix ${M}$, vector ${{\bf v}_1}$ parameter ${k}$

• Pick uniformly at random ${{\bf x} \sim \{-1,1\}^n}$
• ${{\bf x}_0:= {\bf x} - {\bf v}_1 \cdot \langle {\bf x},{\bf v}_1 \rangle}$
• for ${i:=1}$ to ${k}$
${\mbox{} \hspace{30pt} {\bf x}_i := M\cdot {\bf x}_{i-1}}$
• return ${{\bf x}_k}$

If ${{\bf v}_1,\ldots,{\bf v}_n}$ is an orthonormal basis of eigenvectors for the eigenvalues ${\lambda_1 \geq \cdots \geq \lambda_n}$ of ${M}$, then, at the beginning, we pick a random vector

$\displaystyle {\bf x} = a_1 {\bf v}_1 + a_2 {\bf v}_2 + \cdots a_n {\bf v}_n$

that, with probability at least ${3/16}$, satisfies ${|a_2| \geq 1/2}$. (Cf. Lemma 7.) Then we compute ${{\bf x}_0}$, which is the projection of ${{\bf x}}$ on the subspace orthogonal to ${{\bf v}_1}$, that is

$\displaystyle {\bf x}_0 = a_2 {\bf v}_2 + \cdots a_n {\bf v}_n$

Note that ${||{\bf x}||^2 = n}$ and that ${||{\bf x}_0||^2 \leq n}$.

The output is the vector ${{\bf x}_k}$

$\displaystyle {\bf x}_k = a_2 \lambda_2^k {\bf v}_2 + \cdots a_n \lambda_n ^k {\bf v}_n$

If we apply Lemma 8 to subspace orthogonal to ${{\bf v}_1}$, we see that when ${|a_2|\geq 1/2}$ we have that, for every ${0< \epsilon< 1}$,

$\displaystyle \frac{ {\bf x}_k^T M {\bf x}_k}{{\bf x}_k^T{\bf x}_k} \geq \lambda_2 \cdot (1-\epsilon) \cdot \frac 1{4 n (1-\epsilon)^{2k}}$

We have thus established the following analysis.

Theorem 9 For every PSD matrix ${M}$, positive integer ${k}$ and parameter ${\epsilon>0}$, if ${{\bf v}_1}$ is a length-1 eigenvector of the largest eigenvalue of ${M}$, then with probability ${\geq 3/16}$ over the choice of ${{\bf x}_0}$, the algorithm Power2 outputs a vector ${{\bf x}_k \perp {\bf v}_1}$ such that

$\displaystyle \frac {{\bf x}_k^T M {\bf x}_k}{ {\bf x}_k^T{\bf x}_k} \geq \lambda_2 \cdot (1-\epsilon) \cdot \frac 1 {1+4n (1-\epsilon)^{2k}}$

where ${\lambda_2}$ is the second largest eigenvalue of ${M}$, counting multiplicities.

3.3. The Second Smallest Eigenvalue of the Laplacian

Finally, we come to the case in which we want to compute the second smallest eigenvalue of the normalized Laplacian matrix ${L= I - \frac 1d A}$ of a ${d}$-regular graph ${G=(V,E)}$, where ${A}$ is the adjacency matrix of ${G}$.

Consider the matrix ${M:= 2I - L = I+\frac 1{d} A}$. Then if ${0 = \lambda_1 \leq \ldots \leq \lambda_n \leq 2}$ are the eigenvalues of ${L}$, we have that

$\displaystyle 2 = 2-\lambda_1 \geq 2-\lambda_2 \geq \cdots \geq 2-\lambda_n \geq 0$

are the eigenvalues of ${M}$, and that ${M}$ is PSD. ${M}$ and ${L}$ have the same eigenvectors, and so ${{\bf v}_1 = \frac 1 {\sqrt n} (1,\ldots,1)}$ is a length-1 eigenvector of the largest eigenvalue of ${M}$.

By running algorithm Power2, we can find a vector ${{\bf x}}$ such that

$\displaystyle {\bf x}^T M {\bf x}^T \geq (1-\epsilon) \cdot (2-\lambda_2) \cdot {\bf x}^T {\bf x}$

and

$\displaystyle {\bf x}^T M {\bf x}^T = 2 {\bf x}^T{\bf x} - {\bf x}^T L {\bf x}$

so, rearranging, we have

$\displaystyle \frac{{\bf x}^T L {\bf x}}{{\bf x}^T {\bf x}} \leq \lambda_2 + 2\epsilon$

If we want to compute a vector whose Rayleigh quotient is, say, at most ${2\lambda_2}$, then the running time will be ${\tilde O( (|V| + |E|) / \lambda_2)}$, because we need to set ${\epsilon = \lambda_2/2}$, which is not nearly linear in the size of the graph if ${\lambda_2}$ is, say ${O(1/|V|)}$.

For a running time that is nearly linear in ${n}$ for all values of ${\lambda_2}$, one can, instead, apply the power method to the pseudoinverse ${L^+}$ of ${L}$. (Assuming that the graph is connected, ${L^+ {\bf x}}$ is the unique vector ${{\bf y}}$ such that ${L {\bf y} = {\bf x}}$, if ${{\bf x} \perp (1,\ldots,1)}$, and ${L^+ {\bf x} = 0}$ if ${{\bf x}}$ is parallel to ${(1,\ldots,1)}$.) This is because ${L^+}$ has eigenvalues ${0,1/\lambda_2,\ldots, 1/\lambda_n}$, and so ${L^+}$ is PSD and ${1/\lambda_2}$ is its largest eigenvalue.

Although computing ${L^+}$ is not known to be doable in nearly linear time, there are nearly linear time algorithms that, given ${{\bf x}}$, solve in ${{\bf y}}$ the linear system ${L {\bf y} = {\bf x}}$, and this is the same as computing the product ${L^+ {\bf x}}$, which is enough to implement algorithm Power applied to ${L^+}$.

(Such algorithms will be discussed in the third part of the course. The algorithms will find an approximate solution ${{\bf y}}$ to the linear system ${L {\bf y} = {\bf x}}$, but this will be sufficient. In the following, we proceed as if the solution was exact.)

In time ${O((V+|E|) \cdot (\log |V|/\epsilon)^{O(1)})}$, we can find a vector ${{\bf y}}$ such that ${{\bf y} = (L^+)^k {\bf x}}$, where ${{\bf x}}$ is a random vector in ${\{ -1,1 \}^n}$, shifted to be orthogonal to ${(1,\ldots,1)}$ and ${k=O(\log |V|/\epsilon)}$. What is the Rayleigh quotient of such a vector with respect to ${L}$?

Let ${{\bf v}_1,\ldots,{\bf v}_n}$ be a basis of orthonormal eigenvectors for ${L}$ and ${L^+}$. If ${0 = \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n}$ are the eigenvalues of ${L}$, then we have

$\displaystyle L {\bf v}_1 = L^+ {\bf v}_1 = {\bf 0}$

and, for ${i=1,\ldots,n}$, we have

$\displaystyle L{\bf v}_ i = \lambda_i \ \ \ L^+ {\bf v}_i = \frac 1 {\lambda_i}$

Write ${{\bf x} = a_2 {\bf v}_2 + \cdots a_n {\bf v}_n}$, where ${\sum_i a_i^2 \leq n}$, and ssume that, as happens with probability at least ${3/16}$, we have ${a_2^2 \geq \frac 14}$. Then

$\displaystyle {\bf y} = \sum_{i=2}^n a_i \frac 1 {\lambda_i^k}$

and the Rayleigh quotient of ${{\bf y} }$ with respect to ${L}$ is

$\displaystyle \frac {{\bf y} ^T L {\bf y} }{{\bf y}^T {\bf y}} = \frac { \sum_i a_i^2 \frac 1 {\lambda_i^{2k-1}}}{\sum_i a_i^2 \frac 1 {\lambda_i^{2k}}}$

and the analysis proceeds similarly to the analysis of the previous section. If we let ${\ell}$ be the index such that ${\lambda_\ell \leq (1+\epsilon) \cdot \lambda_2 \leq \lambda_{\ell+1}}$ then we can upper bound the numerator as

$\displaystyle \sum_i a_i^2 \frac 1 {\lambda_i^{2k-1}} \leq \sum_{i\leq \ell} a_i^2 \frac 1 {\lambda_i^{2k-1}} + \frac 1 {(1+\epsilon)^{2k-1} \lambda_2 ^{2k-1} } \sum_{i>\ell} a_i^2$

$\displaystyle \leq \sum_{i\leq \ell} a_i^2 \frac 1 {\lambda_i^{2k-1}} + \frac 1 {(1+\epsilon)^{2k-1} \lambda_2 ^{2k-1} } \cdot n$

$\displaystyle \leq \sum_{i\leq \ell} a_i^2 \frac 1 {\lambda_i^{2k-1}} + \frac 1 {(1+\epsilon)^{2k-1} \lambda_2 ^{2k-1} } \cdot 4n a_2^2$

$\displaystyle \leq \left( 1 + \frac {4n} {(1+\epsilon)^{2k-1} } \right) \cdot \sum_{i\leq \ell} a_i^2 \frac 1 {\lambda_i^{2k-1}}$

and we can lower bound the denominator as

$\displaystyle \sum_i a_i^2 \frac 1 {\lambda_i^{2k}} \geq \sum_{i \leq \ell} a_i^2 \frac 1 {\lambda_i^{2k}}$

$\displaystyle \geq \frac 1 {(1+\epsilon) \lambda_2} \cdot \sum_{i \leq \ell} a_i^2 \frac 1 {\lambda_i^{2k-1}}$

and the Rayleigh quotient is

$\displaystyle \frac {{\bf y} ^T L {\bf y} }{{\bf y}^T {\bf y}} \leq \lambda_2 \cdot (1+\epsilon) \cdot \left( 1 + \frac {4n} {(1+\epsilon)^{2k-1} } \right) \leq (1+2\epsilon) \cdot \lambda_2$

when ${k=O\left(\frac 1 \epsilon \log \frac n \epsilon \right)}$.

## One thought on “CS294 Lecture 8: Spectral Algorithms Wrap-up”

1. Typo in proof of Lem 8, After “So we have”:
should be “sum_{i = 1}^l a_i^2 \lambda_i^{2k}”