CS294 Lecture 3: Cheeger Inequalities

In which we generalize the notion of normalized Laplacian to irregular graphs, we extend the basic spectral graph theory results from last lecture to irregular graphs, and we prove the easy direction of Cheeger’s inequalities.

1. Irregular Graphs

Let {G=(V,E)} be an undirected graph, not necessarily regular. We will assume that every vertex has non-zero degree. We would like to define a normalized Laplacian matrix associated to {G} so that the properties we proved last time are true: that the multiplicity of 0 as an eigenvalue is equal to the number of connected components of {G}, that the largest eigenvalue is at most 2, and that it is 2 if and only if (a connected component of) the graph is bipartite.

Continue reading

CS294 Lecture 2: Basics of Spectral Graph Theory

In which we introduce the Laplacian matrix and we prove our first results in spectral graph theory.

1. The Basics of Spectral Graph Theory

Given an undirected graph {G= (V,E)}, the approach of spectral graph theory is to associate a symmetric real-valued matrix to {G}, and to related the eigenvalues of the matrix to combinatorial properties of {G}.

For the sake of this lecture, we will restrict ourselves to the case in which {G} is a {d}-regular graph, and we will then see how to extend our results to apply to irregular graphs as well.

The most natural matrix to associate to {G} is the adjacency matrix {A} such that {A_{i,j} = 1} if {\{ i,j\} \in E} and {A_{i,j} = 0} otherwise. In the second part of the course, in which we will study expander graphs, the adjacency matrix will indeed be the most convenient matrix to work with. For the sake of the algorithms that we will analyze in the first part of the course, however, a slight variation called the normalized Laplacian is more convenient.

There are a few ways to motivate the definition of the Laplacian. One way is the following: the variational characterization of the eigenvalues of real symmetric matrices tells us that we can think of the eigenvalues of {M} as optima of min-max optimization problems in which vectors {{\bf x}\in {\mathbb R}^V} are feasible solutions and the cost function is the Rayleigh quotient

\displaystyle  R_M(x) = \frac {{\bf x}^T M {\bf x}}{{\bf x}^T {\bf x}}

We know that every homogeneous polynomial of degree 2 can be realized as {x^T M x} for some matrix {M}, and,if we want to study cuts in a graph {G=(V,E)}, it makes sense to choose a matrix {M} such that

\displaystyle  {\bf x}^T M {\bf x} = \sum_{ \{ u,v\} \in E} (x_u - x_v)^2

because, if {{\bf x}\in \{ 0,1 \}^V} is a Boolean vector, representing a cut in the graph, then the right-hand-side expression above is counting the number of edges that cross the cut, and so optimization problems with the above cost functions will be relaxations of cut problems.

Some calculations show that the matrix having such a property is {dI - A}, which is called the Laplacian matrix of {G}. Indeed, we can verify that

\displaystyle  {\bf x}^T (dI - A) {\bf x} = \sum_{ \{ u,v\} \in E} (x_u - x_v)^2

because both expressions are easily seen to be equal to

\displaystyle  \sum_v d x_v^2 - 2 \sum_{\{ u,v \} \in E} x_u x_v

As we will see in a moment, the eigenvalues of {dI - A} are in the range {[0,2d]}, and it is not hard to see that their sum is {dn}, so it is convenient to divide the Laplacian matrix by {d} so that the range and the average values of the eigenvalues of the resulting matrix are independent of the degree. (This degree independence will make it possible to generalize results to the irregular case.)

We have thus reached the following definition.

Definition 1 (Normalized Laplacian) The normalized Laplacian matrix of an undirected {d}-regular graph {G=(V,E)} is {L:= I - \frac 1d A}.

We shall now prove the following relations between the eigenvalues of {L} and certain purely combinatorial properties of {G}.

Theorem 2 Let {G} be a {d}-regular undirected graph, let {A} be the adjacency matrix of {G}, and {L = I - \frac 1d \cdot A} be the normalized Laplacian matrix of {G}. Let {\lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n} be the real eigenvalues of {L} with multiplicities, in nondecreasing order. Then

  1. {\lambda_1 = 0} and {\lambda_n \leq 2}.
  2. {\lambda_k=0} if and only if {G} has at least {k} connected components.
  3. {\lambda_n= 2} if and only if at least one of the connected components of {G} is bipartite.

Note that the first two properties imply that the multiplicity of 0 as an eigenvalue is precisely the number of connected components of {G}.

Proof: By the characterization of the Rayleigh quotient of {L} that we established above, and from the variational characterization of eigenvalues, we have

\displaystyle  \lambda_1 = \min_{{\bf x} \in {\mathbb R}^n - \{ {\bf {0}} \} } \frac {\sum_{ \{ u,v\} \in E} (x_u - x_v)^2}{d \sum_v x_v^2}

and so {\lambda_1 \geq 0} because the Rayleigh quotient, being a ratio of sums of squares, is always non-negative.

If we take {{\bf 1} = (1,\ldots,1)} to be the all-one vector, we see that its Rayleigh quotient is {0}, and so {0} is the smallest eigenvalue of {L}, with {{\bf 1}} being one of the vectors in the eigenspace of 1.

We also have the following formula for {\lambda_k}:

\displaystyle  \lambda_k = \min_{S\ k{\rm -dimensional \ subspace\ of \ } {\mathbb R}^n} \ \ \max_{{\bf x} \in S- \{ {\bf {0}} \} } \ \frac {\sum_{\{ u,v\} \in E} (x_u - x_v)^2}{d \sum_v x_v^2 }

So, if {\lambda_k=0}, there must exist a {k}-dimensional space {S} such that for every {{\bf x} \in S} and every {\{ u,v \} \in E}, we have {x_u = x_v}, and so {x_u=x_v} for every {u,v} which are in the same connected component. This means that each {{\bf x} \in S} must be constant within each connected component of {G}, and so the dimension of {S} can be at most the number of connected components of {G}, meaning that {G} has at least {k} connected components.

Conversely, if {G} has at least {k} connected components, we can let {S} be the space of vectors that are constant within each component, and {S} is a space of dimension at least {k} such that for every element {{\bf x}} of {S} we have

\displaystyle  \sum_{\{ u,v\} \in E} (x_u - x_v)^2= 0

meaning that {S} is a witness of the fact that {\lambda_k=0}.

Finally, to study {\lambda_n}, we first note that we have the formula

\displaystyle  \lambda_n = \max_{{\bf x} \in {\mathbb R}^n -\{ {\bf {0}} \}} \frac{{\bf x}^T L {\bf x}}{{\bf x}^T {\bf x}}

which we can prove by using the variational characterization of the eigenvalues of {-L} and noting that {-\lambda_n} is the smallest eigenvalue of {-L}.

We also observe that for every vector {{\bf x}\in {\mathbb R}^n} we have

\displaystyle  2- {\bf x}^T L {\bf x} = \frac 1{d} \sum_{\{ u,v \} \in E} (x_u + x_v)^2

and so

\displaystyle  \lambda_n \leq 2

and if {\lambda_n = 2} then there must be a non-zero vector {{\bf x}} such that

\displaystyle  \sum_{\{ u,v \} \in E} (x_u + x_v)^2 = 0

which means that {x_u = -x_v} for every edge {(u,v)\in E}.

Let us now define {A:= \{ v : x_v > 0 \}} and {B:= \{ v : x_v < 0 \}}. The set {A\cup B} is non-empty (otherwise we would have {{\bf x} = {\bf 0}}) and is either the entire graph, or else it is disconnected from the rest of the graph, because otherwise an edge with an endpoint in {A\cup B} and an endpoint in {V- (A \cup B)} would give a positive contribution to {\sum_{\{ u,v\} \in E} (x_u - x_v)^2}; furthermore, every edge incident on a vertex on {A} must have the other endpoint in {B}, and vice versa. Thus, {A\cup B} is a connected component, or a collection of connected components, of {G} which is bipartite, with the bipartition {A,B}. \Box

CS294 Lecture 1: Introduction

In which we describe what this course is about.

1. Overview

This is class is about applications of linear algebra to graph theory and to graph algorithms. In the finite-dimensional case, linear algebra deals with vectors and matrices, and with a number of useful concepts and algorithms, such as determinants, eigenvalues, eigenvectors, and solutions to systems of linear equations.

The application to graph theory and graph algorithms comes from associating, in a natural way, a matrix to a graph {G=(V,E)}, and then interpreting the above concepts and algorithms in graph-theoretic language. The most natural representation of a graph as a matrix is via the {|V| \times |V|} adjacency matrix of a graph, and certain related matrices, such as the Laplacian and normalized Laplacian matrix will be our main focus. We can think of {|V|}-dimensional Boolean vectors as a representing a partition of the vertices, that is, a cut in the graph, and we can think of arbitrary vectors as fractional cuts. From this point of view, eigenvalues are the optima of continuous relaxations of certain cut problems, the corresponding eigenvectors are optimal solutions, and connections between spectrum and cut structures are given by rounding algorithms converting fractional solutions into integral ones. Flow problems are dual to cut problems, so one would expect linear algebraic techniques to be helpful to find flows in networks: this is the case, via the theory of electrical flows, which can be found as solutions to linear systems.

The course can be roughly subdivided into three parts: in the first part of the course we will study spectral graph algorithms, that is, graph algorithms that make use of eigenvalues and eigenvectors of the normalized Laplacian of the given graph. In the second part of the course we will look at constructions of expander graphs, and their applications. In the third part of the course, we will look at fast algorithms for solving systems of linear equations of the form {L {\bf x} = {\bf b}}, where {L} is Laplacian of a graph, their applications to finding electrical flows, and the applications of electrical flows to solving the max flow problem.

Continue reading

CS294 Lecture 0: Linear Algebra Review

In which we review linear algebra prerequisites.

The following background from linear algebra will be sufficient for the sake of this course: to know what is an eigenvalue and an eigenvector, to know that real symmetric matrices have real eigenvalues and their real eigenvectors are orthogonal, and to know the variational characterization of eigenvalues.

1. Basic Definitions

If {x=a+ib} is a complex number, then we let {\bar x = a- ib} denote its conjugate. Note that a complex number {x} is real if and only if {x = \bar x}. If {M \in {\mathbb C}^{m \times n}} is a matrix, then {M^*} denotes the conjugate transpose of {M}, that is, {(M^*)_{i,j} = \overline{M_{j,i}}}. If the entries of {M} are real, then {M^* = M^T}, where {M^T} is the transpose of {M}, that is, the matrix such that {(M^T)_{i,j} = M_{j,i}}.

We say that a matrix {M} is Hermitian if {M = M^*}. In particular, real symmetric matrices are Hermitian.

If {{\bf x},{\bf y} \in {\mathbb C}^n} are two vectors, then their inner product is defined as

\displaystyle  \langle {\bf v},{\bf w} \rangle := {\bf v}^* {\bf w} = \sum_i \overline{v_i} \cdot w_i \ \ \ \ \ (1)

Notice that, by definition, we have {\langle {\bf v}, {\bf w} \rangle = (\langle {\bf w}, {\bf v} \rangle)^*} and {\langle {\bf v}, {\bf v} \rangle = || {\bf v} ||^2}. Note also that, for two matrices {A,B}, we have {(A \cdot B)^* = B^* \cdot A^*}, and that for every matrix {M} and every two vectors {{\bf x}}, {{\bf y}}, we have

\displaystyle  \langle M {\bf x} , {\bf y} \rangle = {\bf x}^* M^* {\bf y} = \langle {\bf x} , M^* {\bf y} \rangle

If {M\in {\mathbb C}^{n \times n}} is a square matrix, {\lambda \in {\mathbb C}} is a scalar, {{\bf v} \in {\mathbb C}^n - \{ {\bf {0}} \}} is a non-zero vector and we have

\displaystyle   M {\bf v} = \lambda {\bf v} \ \ \ \ \ (2)

then we say that {\lambda} is an eigenvalue of {M} and that {{\bf v}} is eigenvector of {M} corresponding to the eigenvalue {\lambda}.

2. The Spectral Theorem

We want to prove

Theorem 1 (Spectral Theorem) Let {M\in {\mathbb R}^{n\times n}} be a symmetric matrix with real-valued entries, then there are {n} real numbers (not necessarily distinct) {\lambda_1,\ldots,\lambda_n} and {n} orthonormal real vectors {{\bf x}_1,\ldots,{\bf x}_n}, {{\bf x}_i \in {\mathbb R}^n} such that {{\bf x}_i} is an eigenvector of {\lambda_i}.

Assuming the fundamental theorem of algebra (that every polynomial has a complex root) and basic properties of the determinant, the cleanest proof of the spectral theorem is to proceed by induction on {n}, and to show that {M} must have a real eigenvalue {\lambda_1} with a real eigenvector {{\bf v}_1}, and to show that {M} maps vectors orthogonal to {{\bf v}_1} to vectors orthogonal to {{\bf v}_1}. Then one applies the inductive hypothesis to {M} restricted to the {(n-1)}-dimensional space of vectors orthogonal to {{\bf v}_1} and one recovers the remaining {(n-1)} eigenvalues and eigenvectors.

The cleanest way to formalize the above proof is to give all definitions and results in terms of linear operators {T: X \rightarrow X} where {X} is an arbitrary vector space over the reals. This way, however, we would be giving several definitions that we would never use in the future, so, instead, the inductive proof will use a somewhat inelegant change of basis to pass from {M} to an {(n-1) \times (n-1)} matrix {M'}.

We begin by showing that a real symmetric matrix has real eigenvalues and eigenvectors.

Theorem 2 If {M\in {\mathbb R}^{n\times n}} is symmetric, then there is a real eigenvalue {\lambda \in {\mathbb R}} and a real eigenvector {{\bf v} \in {\mathbb R}^n} such that {M {\bf v} = \lambda {\bf v}}.

We begin by noting that every matrix has a complex eigenvalue.

Lemma 3 For every matrix {M\in {\mathbb C}^{n \times n}}, there is an eigenvalue {\lambda \in {\mathbb C}} and an eigenvector {{\bf v} \in {\mathbb C}^n} such that {M {\bf v} = \lambda {\bf v}}.

Proof: Note that {\lambda} is an eigenvalue for {M} if and only if

\displaystyle  \exists {\bf x} \neq {\bf 0}. \ ( M - \lambda I ) {\bf x} = {\bf 0}

which is true if and only if the rows of {M-\lambda I} are not linearly independent, which is true if and only if

\displaystyle  \det (M - \lambda I ) = 0

Now note that the mapping {t \rightarrow \det (M - t I )} is a univariate polynomial of degree {n} in {t}, and so it must have a root {\lambda} by the fundamental theorem of algebra. \Box

Next we show that if {M} is real and symmetric, then its eigenvalues are real.

Lemma 4 If {M} is Hermitian, then, for every {{\bf x}} and {{\bf y}},

\displaystyle  \langle M {\bf x} , {\bf y} \rangle = \langle {\bf x} , M {\bf y} \rangle

Proof:

\displaystyle  \langle M {\bf x} , {\bf y} \rangle = \langle {\bf x} , M^* {\bf y} \rangle = \langle {\bf x} , M {\bf y} \rangle

\Box

Lemma 5 If {M} is Hermitian, then all the eigenvalues of {M} are real.

Proof: Let {M} be an Hermitian matrix and let {\lambda} be a scalar and {{\bf x}} be a non-zero vector such that {M {\bf x} = \lambda {\bf x}}. We will show that {\lambda = \lambda^*}, which implies that {\lambda} is a real number.

We note that

\displaystyle  \langle M{\bf x}, {\bf x} \rangle = \langle \lambda {\bf x}, {\bf x} \rangle = \lambda^* ||x||^2

and

\displaystyle  \langle {\bf x}, M {\bf x} \rangle = \langle {\bf x}, \lambda {\bf x} \rangle = \lambda ||x||^2

and by the fact that {\langle M{\bf x}, {\bf x} \rangle = \langle {\bf x}, M{\bf x} \rangle} , we have {\lambda = \lambda^*}. \Box

In order to prove Theorem 2, it remains to argue that, for a real eigenvalue of a real symmetric matrix, we can find a real eigenvector.

Proof: } Let {M \in {\mathbb R}^{n \times n}} be a real symmetric matrix, then {M} has a real eigenvalue {\lambda} and a (possibly complex valued) eigenvector {{\bf z} = {\bf x} + i {\bf y}}, where {{\bf x}} and {{\bf y}} are real vectors. We have

\displaystyle  M {\bf x} + i M {\bf y} = \lambda {\bf x} + i \lambda {\bf y}

from which (recalling that the entries of {M} and the scalar {\lambda} are real) it follows that {M {\bf x} = \lambda {\bf x}} and that {M {\bf y} = \lambda {\bf y}}; since {{\bf x}} and {{\bf y}} cannot both be zero, it follows that {\lambda} has a real eigenvector. \Box

We are now ready to prove the spectral theorem

Proof: We proceed by induction on {n}. The case {n=1} is trivial.

Assume that the statement is true for dimension {n-1}. Let {\lambda_1} be a real eigenvalue of {M} and {{\bf x}_1} be a real eigenvector {\lambda_1}.

Now we claim that for every vector {{\bf y}} that is orthogonal to {{\bf x}_1}, then {M{\bf y}} is also orthogonal to {{\bf x}_1}. Indeed, the inner product of {M{\bf y}} and {{\bf x}_1} is

\displaystyle  \langle {\bf x}_1, M{\bf y} \rangle = \langle M {\bf x}_1 , {\bf y} \rangle = \langle \lambda {\bf x}_1,{\bf y} \rangle = 0

Let {X} be the {n-1}-dimensional subspace of {{\mathbb R}^n} that contains all the vectors orthogonal to {{\bf x}_1}. We want to apply the inductive hypothesis to {M} restricted to {X}; we cannot literally do that, because the theorem is not stated in terms of arbitrary linear operators over vector spaces, so we will need to do that by fixing an appropriate basis for {X}.

let {B \in {\mathbb R}^{n \times (n-1)}} be a matrix that computes a bijective map from {{\mathbb R}^{n-1}} to {X}. (If {{\bf b}_1, \ldots,{\bf b}_{n-1}} is an orthonormal basis for {X}, then {B} is just the matrix whose columns are the vectors {{\bf b}_i}.) Let also {B' \in {\mathbb R}^{(n-1) \times n}} be the matrix such that, for every {{\bf y} \in X}, {BB' {\bf y} = {\bf y}}. (We can set {B' = B^T} where {B} is as described above.) We apply the inductive hypothesis to the matrix

\displaystyle  M' := B'MB \in {\mathbb R}^{(n-1) \times (n-1)}

and we find eigenvalues {\lambda_2,\ldots,\lambda_n} and orthonormal eigenvectors {{\bf y}_2,\ldots,{\bf y}_n} for {M'}.

For every {i=2,\ldots,n}, we have

\displaystyle  B'MB {\bf y}_i = \lambda_i {\bf y}_i

and so

\displaystyle  BB'MB{\bf y}_i = \lambda_i B {\bf y}_i

Since {B{\bf y}_i} is orthogonal to {{\bf x}_1}, it follows that {MB{\bf y}_i} is also orthogonal to {{\bf x}_1}, and so {BB'MB{\bf y}_i = MB{\bf y}_i}, so we have

\displaystyle  MB{\bf y}_i = \lambda_i B{\bf y}_i

and, defining {{\bf x}_i := B{\bf y}_i}, we have

\displaystyle  M {\bf x}_i = \lambda_i {\bf x}_i

Finally, we observe that the vectors {{\bf x}_i} are orthogonal. By construction, {{\bf x}_1} is orthogonal to {{\bf x}_2,\ldots,{\bf x}_n}, and, for every {2 \leq i < j \leq n}, we have that

\displaystyle  \langle {\bf x}_i , {\bf x}_j \rangle = \langle B {\bf y}_i , B {\bf y}_j \rangle = \langle {\bf y}_i , B^T B {\bf y} _j \rangle = \langle {\bf y}_i , {\bf y}_j \rangle = 0

We have not verified that the vectors {{\bf x}_i} have norm 1 (which is true), but we can scale them to have norm 1 if not. \Box

3. Variational Characterization of Eigenvalues

We conclude these notes with the variational characterization of eigenvalues for real symmetric matrices.

Theorem 6 Let {M\in {\mathbb R}^{n \times n}} be a symmetric matrix, and {\lambda_1 \leq \lambda_2 \leq \cdots \lambda_n} be the eigenvalues of {M} in non-increasing order. Then

\displaystyle  \lambda_k = \min_{k-{\rm dim} \ X} \ \ \ \max_{{\bf x} \in X - \{ {\bf 0} \} } \ \ \frac{ {\bf x}^T M {\bf x}}{{\bf x}^T {\bf x}}

The quantity {\frac { {\bf x}^T M {\bf x}}{{\bf x}^T {\bf x}} } is called the Rayleigh quotient of {{\bf x}} with respect to {M}, and we will denote it by {R_M({\bf x})}.

Proof: Let {{\bf v}_1,\ldots,{\bf v}_n} be orthonormal eigenvectors of the eigenvalues {\lambda_1,\ldots,\lambda_n}, as promised by the spectral theorem. Consider the {k}-dimensional space spanned by {{\bf v}_1, \ldots,{\bf v}_k}. For every vector {{\bf x} = \sum_{i=1}^k a_i {\bf v}_i} in such a space, the numerator of the Rayleigh quotient is

\displaystyle  \sum_{i,j} a_i a_j {\bf v}_i^T M {\bf v}_j = \sum_{i,j} a_i a_j \lambda_j {\bf v}_i^T {\bf v}_j = \sum_{i=1}^k \lambda_i a_i^2 \leq \lambda_k \cdot \sum_{i=1}^k a_i^2

The denominator is clearly {\sum_{i=1}^k a_j^2}, and so {R_M({\bf x}) \leq \lambda_k}. This shows that

\displaystyle  \lambda_k \geq \min_{k-{\rm dim} \ X} \ \ \ \max_{{\bf x} \in X - \{ {\bf 0} \}} \ \ \frac{ {\bf x}^T M {\bf x}}{{\bf x}^T {\bf x}}

For the other direction, let {X} be any {k}-dimensional space: we will show that {X} must contain a vector of Rayleigh quotient {\geq \lambda_k}. Let {S} be the span of {{\bf v}_k,\ldots,{\bf v}_n}; since {S} has dimension {n-k+1} and {X} has dimension {k}, they must have some non-zero vector in common. Let {{\bf x}} be one such vector, and let us write {{\bf x} = \sum_{i=k}^n a_i {\bf v}_i}. The numerator of the Rayleigh quotient of {{\bf x}} is

\displaystyle  \sum_{i=k}^n \lambda_i a_i^2 \geq \lambda_k \sum_i a_i^2

and the denominator is {\sum_i a_i^2}, so {R_M({\bf x}) \geq \lambda_k}. \Box

We have the following easy consequence.

Fact 7 If {\lambda_1} is the smallest eigenvalue of a real symmetric matrix {M}, then

\displaystyle  \lambda_1 = \min_{{\bf x} \neq {\bf 0}} R_M({\bf x})

Furthermore, every minimizer is an eigenvector of {\lambda_1}.

Proof: The identity is the {k=1} case of the previous theorem. For the furthermore part, let {\lambda_1 \leq \cdots \lambda_n} be the list of eigenvalues of {M} in non-decreasing order, and {{\bf v}_1,\ldots,{\bf v}_n} be corresponding eigenvectors. If {{\bf x} = \sum_i a_i {\bf v}_i} is any vector, then

\displaystyle  R_M( {\bf x} ) = \frac { \sum_i \lambda_i a_i^2}{\sum_i a_i^2}

If {R_M({\bf x}) = \lambda_1}, then {a_i = 0} for every {i} such that {\lambda_i > \lambda_1}, that is, {{\bf x}} is a linear combination of eigenvectors of {\lambda_1}, and hence it is an eigenvector of {\lambda_1}. \Box

Fact 8 If {\lambda_n} is the largest eigenvalue of a real symmetric matrix {M}, then

\displaystyle  \lambda_n = \max_{{\bf x} \neq {\bf 0}} R_M({\bf x})

Furthermore, every maximizer is an eigenvector of {\lambda_n}.

Proof: Apply Fact 7 to the matrix {-M}. \Box

Fact 9 If {\lambda_1} is the smallest eigenvalue of a real symmetric matrix {M}, and {{\bf x}_1} is an eigenvector of {\lambda_1}, then

\displaystyle  \lambda_2 = \min_{{\bf x} \neq {\bf 0},\ {\bf x} \perp {\bf x}_1} \ \ \ R_M({\bf x})

Furthermore, every minimizer is an eigenvector of {\lambda_2}.

Proof: A more conceptual proof would be to consider the restriction of {M} to the space orthogonal to {{\bf x}_1}, and then apply Fact 7 to such a linear operator. But, since we have not developed the theory for general linear operators, we would need to explicitly reduce to an {(n-1)}-dimensional case via a projection operator as in the proof of the spectral theorem.

Instead, we will give a more hands-on proof. Let {\lambda_1 \leq \lambda_2 \leq \cdots \lambda_n} be the list of eigenvalues of {M}, with multiplicities, and {{\bf v}_1,\ldots,{\bf v}_n} be orthonormal vectors as given by the spectral theorem. We may assume that {{\bf v}_1 = {\bf x}_1}, possibly by changing the orthonormal basis of the eigenspace of {\lambda_1}. For every vector {{\bf x} = \sum_{i=2}^k a_i {\bf v}_i } orthogonal to {{\bf v}_1}, its Rayleigh quotient is

\displaystyle  R_M({\bf x}) = \frac { \sum_{i=2}^n \lambda_i a_i^2}{\sum_i a_i^2} \geq \lambda_2

and the minimum is achieved by vectors {{\bf x}} such that {a_i = 0} for every {\lambda_i > \lambda_2}, that is, for vectors {{\bf x}} which are linear combinations of the eigenvectors of {\lambda_2}, and so every minimizer is an eigenvector of {\lambda_2}. \Box

Course on spectral methods and expanders

This semester, starting tomorrow, I am teaching a course on spectral methods and expanders. This is similar to a course I offered twice at Stanford, but this time it will be a 15-week course instead of a 10-week one.

The Stanford course had two main components: (1) spectral algorithms for sparsest cut, and comparisons with LP and SDP based methods, and (2) properties and constructions of expanders.

I will use the additional time to talk a bit more about spectral algorithms, including clustering algorithms, and about constructions of expanders, and to add a third part about electrical networks, sparsification, and max flow.

Lecture notes will be posted here after each lecture.

In some more detail, the course will start with a review of linear algebra and a proof of basic spectral graph theory facts, such as the multiplicity of 0 as an eigenvalue of the Laplacian being the same as the number of connected components of a graph.

Then we will introduce expansion and conductance, and prove Cheeger’s inequality. We will do so in the language of approximation algorithms, and we will see how the analysis of Fiedler’s algorithm given by Cheeger’s inequality compares to the Leighton-Rao analysis of the LP relaxation and the Arora-Rao-Vazirani analysis of the SDP relaxation. Then we will prove several variants of Cheeger’s inequality, interpreting them as analyses of spectral algorithms for clustering and max cut.

In the second part of the course, we will see properties of expanders and combinatorial and algebraic constructions of expanders. We will talk about the theory that gives eigenvalues and eigenvectors of Abelian Cayley graphs, the zig-zag graph product, and the Margulis-Gabber-Galil construction. I would also like to talk about the expansion of random graphs, and to explain how one gets expander constructions from Selberg’s “3/16 theorem,” although I am not sure if there will be time for that.

The first two parts will be tied together by looking at the MCMC algorithm to approximate the number of perfect matchings in a dense bipartite graph. The analysis of the algorithm depends on the mixing time of a certain exponentially big graph, the mixing time will be determined (as shown in a previous lecture on properties of expanders) by the eigenvalue gap, the eigenvalue gap will be determined (as shown by Cheeger’s inequality) by the conductance, and the conductance can be bounded by constructing certain multicommodity flows (as shown in the analysis of the Leighton-Rao algorithms).

In the third part, we will talk about electrical networks, effective resistance and electrical flows, see how to get sparsifiers using effective resistance, a sketch of how to salve Laplacian equations in nearly linear time, and how to approximate max flow using electrical flows.

End-of-year traditions

Having spent some time in Japan, I have learnt of the tradition of holding a Bōnenkai, literally a party to forget the year. Held either as a company end-of-year party, or by groups of friends, it’s a get-together in which people drink a lot and forget the bad things that happened to them during the year.

It occurred to me that this is the complement of Thanksgiving, in which you get together to remember the good things that happened during the year.

I don’t think there is anything else left to say about the difference between Japanese and American culture.

Interestingly, there are a couple more possibilities. One could remember the bad things that happened during the year, as in the airing of grievances during Festivus.

Finally, one could forget the good things, which is very much the Italian attitude.

Edited to add: I don’t know how I forgot (ah!) but there is a famous Neapolitan folk song that goes

Chi ha avuto, ha avuto, ha avuto
Chi ha dato, ha dato, ha dato,
Scurdammuce ‘o passato,
simm’e Napule, paisa’

which is roughly

Who has received, has received
Who has given, has given,
Let’s forget the past
We are [all] from Naples

What does it mean when it’s hard to find hard instances?

[In the provincial spirit of Italian newspapers, that often have headlines like “Typhoon in South-East Asia causes widespread destruction; what are the consequences for Italian exports?”, and of men who overhear discussions about women’s issue and say things like “yes, but men have issues too,” I am going to comment on how Babai’s announcement affects me and the kind of problems I work on.]

If someone had told me last week: “a quasi-polynomial time algorithm has been found for a major open problem for which only a slightly subexponential algorithm was known before,” I would have immediately thought Unique Games!

Before Babai’s announcement, Graph Isomorphism had certain interesting properties in common with problems such as Factoring, Discrete Log, and Approximate Closest Vector (for approximation ratios of the order of sqrt (n) or more): no polynomial time algorithm is known, non-trivial algorithms that are much faster than brute force are known, and NP-completeness is not possible because the problem belongs to either NP \cap coNP or NP \cap coAM.

But there is an important difference: there are simple distributions of inputs on which Factoring, Discrete Log, and Closest Vector approximation are believed to be hard on average, and if one proposes an efficiently implementable algorithms for such problems, it can be immediately shown that it does not work. (Or, if it works, it’s already a breakthrough even without a rigorous analysis.)

In the case of Graph Isomorphism, however, it is easy to come up with simple algorithms for which it is very difficult to find counterexamples, and there are algorithms that are rigorously proved to work on certain distributions of random graphs. Now we know that there are in fact no hard instances at all, but, even before, if we believed that Graph Isomorphism was hard, we had to believe that the hard instances were rare and strange, rather than common.

It is also worth pointing out that, using Levin’s theory of average-case complexity, one can show that if any problem at all in NP is hard under any samplable distribution, then for every NP-complete problem we can find a samplable distribution under which the problem is hard. And, in “practice,” natural NP-complete problems do have simple distributions that seem to generate hard instances.

What about Small-set Expansion, Unique Games, and Unique-Games-Hard problems not known to be NP-hard, like O(1)-approximation of Sparsest Cut? We don’t know of any distribution for which it is plausible to conjecture that such problems are hard, and we have algorithms (Lasserre relaxations of constant degree) with no known counterexample. Many simple distributions of instances are rigorously solved by known algorithms. So, if we want to believe the Unique Games conjecture, we have to believe that there are hard instances, but they are rare and strange.

I am sure that it is possible, under standard assumptions, to construct an artificial problem L in NP that is in average-case-P according to Levin’s definition but not in P. Such a problem would not be polynomial time solvable, but it would be easy to solve on average under any samplable distribution and, intuitively, it would be a problem that is hard even though hard instances are rare and strage.

But can a natural problem in NP exhibit this behavior? Now that Graph Isomorphism is not a plausible example any more, I am inclined to believe (until the next surprise) that no natural problem has this behavior, and my guess concerning the Unique Games conjectures is going to be that it is false (or “morally false” in the sense that a quasipolynomial time algorithm exists) until someone comes up with a distribution of Unique Games instances that are plausibly hard on average and that, in particular, exhibit integrality gaps for Lasserre relaxations (even just experimentally).

Laci Babai and Graph Isomorphism

Next Tuesday, a week from today, Laci Babai will talk at the University of Chicago about a new algorithm that solves graph isomorphism in quasipolynomial time. There should also be a follow-up talk the following Thursday that, by a lucky coincidence, I will be able to attend, and then report back.

Meanwhile, if you have any gossip on the proof, then, by any means, go ahead and share it in the comments.

Aldo Fabrizi

Today Aldo Fabrizi would turn 110. Outside of Italy, those who know him at all probably know him from Rome, Open City, one of the early movies of the Neorealismo genre. (It is available in the US in a Criterion edition.)

But, in Italy, Fabrizi is famous for being one of the giants of the first generation of Italian-style comedy, from the 1950s and 1960s. My favorite movies of his are those in which he acts as a straight man for Totò, and my absolute favorite is Guardie e Ladri, which never had an American release.

For those who understand Italian, it’s possible to find the whole movie on youtube. Here is one iconic scene.

How was FOCS 2015?

Back around 2010, the Simons Institute for the Theory of Computing at Berkeley offered to organize FOCS in 2013, 2015 and 2017. So far, the IEEE technical committee on mathematical foundations of computing has taken us up on this offer in 2013 and 2015, and, unless a competing bid is presented, FOCS will come again to Berkeley in 2017.

Unfortunately there is no hotel in downtown Berkeley that is able to accommodate FOCS. The Shattuck hotel almost but not quite is. (There are two conference rooms, but they are of very different size, and the space to hang out for coffee breaks is much too small for 200+ people, and it’s outdoors, which is potentially bad because rain in October is unlikely but not impossible in Berkeley.)

This leaves us with the Doubletree hotel in the Berkeley Marina, which has some advantages, such as views of the bay and good facilities, and some disadvantages, such as the isolated location and the high prices. The location also forces us to provide lunches, because it would be inconvenient for people to drive to lunch places and then drive back during the lunch break. Being well aware of this, the hotel charges extortionate fees for food.

This is to say that, planning for FOCS 2017, there is nothing much different that we can do, although there are lots of little details that we can adjust, and it would be great to know how people’s experience was.

For example, did the block of discounted hotel rooms run out too soon? Would you have liked to have received something else with your registration than just the badge? If so, what? (So far, I have heard suggestions for FOCS-branded hats, t-shirts, and teddy bears.) Wasn’t it awesome to have a full bar at the business meeting? Why did nobody try the soups at lunch? The soups were delicious!