*In which we generalize the notion of normalized Laplacian to irregular graphs, we extend the basic spectral graph theory results from last lecture to irregular graphs, and we prove the easy direction of Cheeger’s inequalities.*

**1. Irregular Graphs **

Let be an undirected graph, not necessarily regular. We will assume that every vertex has non-zero degree. We would like to define a normalized Laplacian matrix associated to so that the properties we proved last time are true: that the multiplicity of 0 as an eigenvalue is equal to the number of connected components of , that the largest eigenvalue is at most 2, and that it is 2 if and only if (a connected component of) the graph is bipartite.

In order to have a matrix such that zero is the smallest eigenvalue, and such that multiplicity of zero is the number of connected component, we want a matrix such that the numerator of the Rayleigh quotient is (a multiple of)

and the matrix such that is the above expression is the matrix , where is the diagonal matrix such that , the degree of . The matrix is called the *Laplacian* matrix of . Note that there is no fixed constant upper bound to the largest eigenvalue of ; for example, if is a -regular bipartite graph, the largest eigenvalue is , as proved in the last lecture.

Some calculations shows that the right analog of the normalization that we did in the regular case (in which we divided by the degree ) would be to have a matrix whose Rayleigh quotient is

and it’s clear that the above expression is at most 2 for every , and it is possible to find an for which the above expression is 2 if and only if has a bipartite connected component.

Unfortunately, there is no matrix whose Rayleigh quotient equals (1), because the denominator of a Rayleigh quotient is, by definition, regardless of the matrix.

One way to work around this problem would be to give a more general form of the variational characterization of eigenvalues, in which we have an arbitrary inner product , and the Rayleigh quotient is defined as .

Here we will proceed in a way that is essentially equivalent, but without introducing this additional definitional framework.

The point is that, if we look at the Rayleigh quotient of the vector , where is the diagonal matrix such that , then the denominator will indeed be , and that we can find a matrix such that the numerator of the Rayleigh quotient of is , so that the Rayleigh quotient is indeed the expression in (1).

This matrix is called the *normalized Laplacian* of and, by the above observation, it has to be . Note that, in a -regular graph, we get , consistent with our definition from the last lecture.

Now the point is that the mapping is linear and bijective, so it maps the set of all possible vectors to the set of all possible vectors, and it maps a -dimensional space to a (possibly different) -dimensional space.

If we let be the eigenvalues of , counting repetitions, the variational characterization gives us

and

from which we have that and that the multiplicity of zero is equal to the number of connected components.

We also have

from which we see that and that if and only if one of the connected components of is bipartite.

**2. Edge Expansion, Fiedler’s Algorithm and Cheeger’s Inequalities **

We will now return, for simplicity, to the regular case.

If is an undirected -regular graph, and is a set of vertices, we call the quantity

the *edge expansion* of . The quantity is the average fraction of neighbors outside of for a random element of , and it compares the actual number of edges crossing the cut with the trivial upper bound .

We define the expansion of a cut as

The edge expansion of the graph is defined as

(Note: it is common in the literature to use the notation to refer to the quantity that we call .)

Finding cuts of small expansion is a problem with several applications. It is an open question if there is a polynomial-time approximation with a constant-factor approximation ratio; a positive answer would refute the “small-set expansion conjecture” which is closely related to the unique games conjecture.

The following algorithm was proposed by Fiedler, and it works well in practice when is the eigenvector of .

- Input: graph , vector
- Sort the vertices according the values , and let be the sorted order
- Find a that minimizes , and output such a cut

Note that Fiedler’s algorithm can be implemented in time , because it takes time to sort the vertices, and the cut of minimal expansion that respects the sorted order can be found in time . (To see that this is the case, consider that, in order to find such a cut, we just need to compute the numbers for each . We see that is equal to the degree of , and that, given , the value of can be computed by just adding to the number of neighbors of in , and subtracting the number of neighbors of in , on operation that can be done in time . Thus the total running time is of the order of , that is, .)

We will prove the following result

Theorem 1 (Cheeger’s Inequalities)Let be an undirected regular graph and be the eigenvalues of the normalized Laplacian, with repetitions, thenFurthermore, if is the cut found by Fiedler’s algorithm given the eigenvector of , then

Note that, from the *furthermore* part of the Theorem, it follows that, if is the cut found by Fiedler’s algorithm given an eigenvector of , we have

which is a worst-case guarantee of the quality of the solution found by the algorithm.

**3. Proof that **

Let be a set of vertices such that . Recall that for every set , we have that expansion of is the same as the Rayleigh quotient of the indicator vector . (The indicator vector of a set is the 0/1 vector whose -th coordinate is 1 if and only if .) So we have

also recall that, from the variational characterization of eigenvalues, we have

We will prove the inequality by showing that all the vectors in the 2-dimensional space of linear combinations of the orthogonal vectors have Rayleigh quotient at most . This is a consequence of the following useful fact.

Lemma 2Let and be two orthogonal vectors, and let be a positive semidefinite matrix. Then

*Proof:* Let be the eigenvalues of and be a corresponding basis of eigenvectors. Let us write and .

The Rayleigh quotient of is

In the first inequality, we used orthogonality of and to derive and we used the Cauchy-Schwarz inequality .

**4. First Part of the Analysis of Fiedler’s Algorithm **

The vector is an eigenvector for 0, which is the smallest eigenvalue of the normalized Laplacian of , and so, from the variational characterization of eigenvalues, we have that

and that any eigenvector of is a minimizer of the above expression. We will prove that and that the *Furthermore* part of Theorem 1 is true by showing the following stronger result:

Lemma 3Let be a vector orthogonal to and let be the cut found by Fiedler’s algorithm given . Then

This stronger form is useful, because often one runs Fiedler’s algorithm on an approximate eigenvector, and Lemma 3 shows that one gets a guarantee on the quality of the resulting cut that does not require to be an eigenvector, as long as its Rayleigh quotient is small.

We divide the proof of Lemma 3 into two parts: we analyze the performance of the algorithm given a vector that, instead of being orthogonal to , has the property of having non-negative entries and at most non-zero entries, and we show that analyzing the performance of the algorithm on vectors of the former type reduces to analyzing the performance on vectors of the latter type.

Lemma 4Let be a vector with non-negative entries. Then there is a such that

Lemma 5Let be orthogonal to . Then there is a vector with at most non-zero entries such that

Furthermore, for every , the cut is one of the cuts considered by Fiedler’s algorithm on input .

Let us quickly see how to prove Lemma 3 given Lemma 4 and Lemma 5. Let be a vector orthogonal to , and let be the cut found by Fiedler’s algorithm given . Let be the non-negative vector with at most positive entries and such that as promised by Lemma 5. Let be a threshold such that

as promised by Lemma 5. The set contains at most vertices, and the cut is one of the cuts considered by Fiedler’s algorithm on input , and so

We will prove Lemma 4 next time. We conclude this lecture with a proof of Lemma~5.

*Proof:* (Of Lemma 5) First we observe that, for every constant ,

because the numerator of and the numerator of are the same, and the denominator of is .

Let be the median value of the entries of , and call . Then we have , and the median of the entries of is zero, meaning that has at most positive entries and at most negative entries. We will refer to the vertices such that as the *positive* vertices, and the vertices such that as the *negative* vertices.

We write

where if is positive and otherwise; similarly, if is negative, and otherwise. Note that and are orthogonal, non-negative, and each of them has at most nonzero entries. Note also that, for every , the cut defined by the set is one of the cuts considered by Fiedler’s algorithm on input , because it is the cut

Similarly, for every , the cut defined by the set is one of the cuts considered by Fiedler’s algorithm on input , because it is the cut

It remains to show that at least one of or has Rayleigh quotient smaller than or equal to the Rayleigh quotient of (and, hence, of ). We claim that

The only step that we need to justify is that for every edge we have

If is an edge between two non-positive vertices, or between two non-negative vertices, then the left-hand side and the right-hand side are clearly equal. If it is an edge between a positive vertex and a negative vertex , then the left-hand side is equal to , and the right-hand side is equal to .

]]>

*In which we introduce the Laplacian matrix and we prove our first results in spectral graph theory.*

**1. The Basics of Spectral Graph Theory **

Given an undirected graph , the approach of spectral graph theory is to associate a symmetric real-valued matrix to , and to related the eigenvalues of the matrix to combinatorial properties of .

For the sake of this lecture, we will restrict ourselves to the case in which is a -regular graph, and we will then see how to extend our results to apply to irregular graphs as well.

The most natural matrix to associate to is the adjacency matrix such that if and otherwise. In the second part of the course, in which we will study expander graphs, the adjacency matrix will indeed be the most convenient matrix to work with. For the sake of the algorithms that we will analyze in the first part of the course, however, a slight variation called the *normalized Laplacian* is more convenient.

There are a few ways to motivate the definition of the Laplacian. One way is the following: the variational characterization of the eigenvalues of real symmetric matrices tells us that we can think of the eigenvalues of as optima of min-max optimization problems in which vectors are feasible solutions and the cost function is the Rayleigh quotient

We know that every homogeneous polynomial of degree 2 can be realized as for some matrix , and,if we want to study cuts in a graph , it makes sense to choose a matrix such that

because, if is a Boolean vector, representing a cut in the graph, then the right-hand-side expression above is counting the number of edges that cross the cut, and so optimization problems with the above cost functions will be relaxations of cut problems.

Some calculations show that the matrix having such a property is , which is called the Laplacian matrix of . Indeed, we can verify that

because both expressions are easily seen to be equal to

As we will see in a moment, the eigenvalues of are in the range , and it is not hard to see that their sum is , so it is convenient to divide the Laplacian matrix by so that the range and the average values of the eigenvalues of the resulting matrix are independent of the degree. (This degree independence will make it possible to generalize results to the irregular case.)

We have thus reached the following definition.

Definition 1 (Normalized Laplacian)The normalized Laplacian matrix of an undirected -regular graph is .

We shall now prove the following relations between the eigenvalues of and certain purely combinatorial properties of .

Theorem 2Let be a -regular undirected graph, let be the adjacency matrix of , and be the normalized Laplacian matrix of . Let be the real eigenvalues of with multiplicities, in nondecreasing order. Then

- and .
- if and only if has at least connected components.
- if and only if at least one of the connected components of is bipartite.

Note that the first two properties imply that the multiplicity of 0 as an eigenvalue is precisely the number of connected components of .

*Proof:* By the characterization of the Rayleigh quotient of that we established above, and from the variational characterization of eigenvalues, we have

and so because the Rayleigh quotient, being a ratio of sums of squares, is always non-negative.

If we take to be the all-one vector, we see that its Rayleigh quotient is , and so is the smallest eigenvalue of , with being one of the vectors in the eigenspace of 1.

We also have the following formula for :

So, if , there must exist a -dimensional space such that for every and every , we have , and so for every which are in the same connected component. This means that each must be constant within each connected component of , and so the dimension of can be at most the number of connected components of , meaning that has at least connected components.

Conversely, if has at least connected components, we can let be the space of vectors that are constant within each component, and is a space of dimension at least such that for every element of we have

meaning that is a witness of the fact that .

Finally, to study , we first note that we have the formula

which we can prove by using the variational characterization of the eigenvalues of and noting that is the smallest eigenvalue of .

We also observe that for every vector we have

and so

and if then there must be a non-zero vector such that

which means that for every edge .

Let us now define and . The set is non-empty (otherwise we would have ) and is either the entire graph, or else it is disconnected from the rest of the graph, because otherwise an edge with an endpoint in and an endpoint in would give a positive contribution to ; furthermore, every edge incident on a vertex on must have the other endpoint in , and vice versa. Thus, is a connected component, or a collection of connected components, of which is bipartite, with the bipartition .

]]>

*In which we describe what this course is about.*

**1. Overview **

This is class is about applications of linear algebra to graph theory and to graph algorithms. In the finite-dimensional case, linear algebra deals with vectors and matrices, and with a number of useful concepts and algorithms, such as determinants, eigenvalues, eigenvectors, and solutions to systems of linear equations.

The application to graph theory and graph algorithms comes from associating, in a natural way, a matrix to a graph , and then interpreting the above concepts and algorithms in graph-theoretic language. The most natural representation of a graph as a matrix is via the *adjacency matrix* of a graph, and certain related matrices, such as the *Laplacian* and *normalized Laplacian* matrix will be our main focus. We can think of -dimensional Boolean vectors as a representing a partition of the vertices, that is, a *cut* in the graph, and we can think of arbitrary vectors as *fractional* cuts. From this point of view, eigenvalues are the optima of continuous relaxations of certain cut problems, the corresponding eigenvectors are optimal solutions, and connections between spectrum and cut structures are given by rounding algorithms converting fractional solutions into integral ones. Flow problems are dual to cut problems, so one would expect linear algebraic techniques to be helpful to find flows in networks: this is the case, via the theory of electrical flows, which can be found as solutions to linear systems.

The course can be roughly subdivided into three parts: in the first part of the course we will study *spectral graph algorithms*, that is, graph algorithms that make use of eigenvalues and eigenvectors of the normalized Laplacian of the given graph. In the second part of the course we will look at constructions of expander graphs, and their applications. In the third part of the course, we will look at fast algorithms for solving systems of linear equations of the form , where is Laplacian of a graph, their applications to finding electrical flows, and the applications of electrical flows to solving the max flow problem.

**2. Spectral Graph Algorithms **

We will study approximation algorithms for the *sparsest cut* problem, in which one wants to find a cut (a partition into two sets) of the vertex set of a given graph so that a minimal number of edges cross the cut compared to the number of pairs of vertices that are disconnected by the removal of such edges.

This problem is related to estimating the edge expansion of a graph and to find *balanced separators*, that is, ways to disconnect a constant fraction of the pairs of vertices in a graph after removing a minimal number of edges.

Finding balanced separators and sparse cuts arises in *clustering* problems, in which the presence of an edge denotes a relation of similarity, and one wants to partition vertices into few clusters so that, for the most part, vertices in the same cluster are similar and vertices in different clusters are not. For example, sparse cut approximation algorithms are used for *image segmentation*, by reducing the image segmentation problem to a graph clustering problem in which the vertices are the pixels of the image and the (weights of the) edges represent similarities between nearby pixels.

Balanced separators are also useful in the design of divide-and-conquer algorithms for graph problems, in which one finds a small set of edges that disconnects the graph, recursively solves the problem on the connected components, and then patches the partial solutions and the edges of the cut, via either exact methods (usually dynamic programming) or approximate heuristic. The sparsity of the cut determines the running time of the exact algorithms and the quality of approximation of the heuristic ones.

We will study a spectral algorithm first proposed by Fiedler in the 1970s, and to put its analysis into a broader context, we will also study the Leighton-Rao algorithm, which is based on linear programming, and the Arora-Rao-Vazirani algorithm, which is based on semidefinite programming. We will see how the three algorithms are based on conceptually similar continuous relaxations.

Before giving the definition of sparsest cut, it is helpful to consider examples of graphs that have very sparse cuts, in order to gain intuition.

Suppose that a communication network is shaped as a path, with the vertices representing the communicating devices and the edges representing the available links. The clearly undesirable feature of such a configuration is that the failure of a single edge can cause the network to be disconnected, and, in particular, the failure of the middle edge will disconnect half of the vertices from the other half.

This is a situation that can occur in reality. Most of Italian highway traffic is along the highway that connect Milan to Naples via Bologna, Florence and Rome. The section between Bologna and Florence goes through relatively high mountain passes, and snow and ice can cause road closures. When this happens, it is almost impossible to drive between Northern and Southern Italy. Closer to California, I was once driving from Banff, a mountain resort town in Alberta which hosts a mathematical institute, back to the US. Suddenly, traffic on Canada’s highway 1 came to a stop. People from the other cars, after a while, got out of the cars and started hanging out and chatting on the side of the road. We asked if there was any other way to go in case whatever accident was ahead of us would cause a long road closure. They said no, this is the only highway here. Thankfully we started moving again in half an hour or so.

Now, consider a two-dimensional grid. The removal of an edge cannot disconnect the graph, and the removal of a constant number of edges can only disconnected a constant number of vertices from the rest of the graph, but it is possible to remove just edges, a fraction of the total, and have half of the vertices be disconnected from the other half.

A -dimensional hypercube with is considerably better connected than a grid, although it is still possible to remove a vanishingly small fraction of edges (the edges of a dimension cut, which are a fraction of the total number of edges) and disconnect half of the vertices from the other half.

Clearly, the most reliable network layout is the clique; in a clique, if an adversary wants to disconnect a fraction of vertices from the rest of the graph, he has to remove at least a fraction of edges from the graph.

This property of the clique will be our “gold standard” for reliability. The expansion and the sparsest cut parameters of a graph measure how worse a graph is compared with a clique from this point of view.

For simplicity, here we will give definitions that apply only to the case of regular graphs.

Definition 1 (Edge expansion of a set)Let be a -regular graph, and a subset of vertices. The edge expansion of is

where is the number of edges in that have one endpoint in and one endpoint in .

is a trivial upper bound to the number of edges that can leave , and so measures how much smaller the actual number of edges is than this upper bound. We can also think of as the probability that, if we pick a random node in and then a random neighbor of , the node happens to be outside of .

The quantity is the average fraction of neighbors that vertices in have within . For example, if represents a social network, and is a subset of users of expansion , this means that, on average, the users in have of their friends within .

If is a cut of the graph, and , then is, within a factor of two, the ratio between the fraction of edges that we have to remove to disconnect from , and the fraction of pairs of vertices that become unreachable if we do so. We define the edge expansion of a cut as

The edge expansion of a graph is the minimum of the edge expansion of all cuts.

Definition 2 (Edge expansion of a graph)Let be a -regular graph, its edge expansion is

If is the adjacency matrix of a -regular graph , then the *normalized Laplacian* of is the matrix . We will prove the Cheeger inequalities: that if are the eigenvalues of , counted with multiplicities and sorted in nondecreasing order, then

The lower bound follows by using the variational characterization of eigenvalues to think of as the optimum of a continuous optimization problem, and then realizing that, from this point of view, is actually the optimum of a *relaxation* of .

The upper bound has a constructive proof, showing that the set returned by Fiedler’s algorithm has size and satisfies . The two inequalities, combined, show that and provide a (tight) worst-case analysis of the quality of the cut found by Fiedler-s algorithm, compared with the optimal cut.

To put this result in a broader context, we will see the Leighton-Rao approximation algorithm, based on linear programming, which finds a cut of expansion , and the Arora-Rao-Vazirani algorithm, based on semidefinite programming, which finds a cut of expansion . The spectral, linear programming, and semidefinite programming relaxation can all be seen as very related.

We will then consider combinatorial characterizations, and algorithms for other laplacian eigenvalues.

We will prove a “higher order” Cheeger inequality that characterizes for similarly to how the standard Cheeger inequality characterizes , and the proof will provide a worst-case analysis of spectral partitioning algorithms similarly to how the proof of the standard Cheeger inequality provides a worst-case analysis of Fiedler’s algorithm.

The outcome of these results is that small Laplacian eigenvalues characterize the presence of sparse cuts in the graph. Analogously, we will show that the value of characterizes large cuts, and the proof a Cheeger-type inequality for will lead to the worst-case analysis of a spectral algorithm for max cut.

**3. Constructions and Applications of Expander Graphs **

A family of constant-degree expanders is a collection of arbitrarily large graphs, all of degree and edge expansion . Expanders are useful in several applications, and a common theme in such applications is that even though they are sparse, they have some of the “connectivity” properties of a complete graph.

For example, if one removes a fraction of edges from an expander, one is left with a connected component that contains a fraction of vertices.

Lemma 3Let be a regular graph of expansion . Then, after an fraction of the edges are adversarially removed, the graph has a connected component that spans at least fraction of the vertices.

*Proof:* Let be the degree of , and let be an arbitrary subset of edges. Let be the connected components of the graph , ordered so that . We want to prove that . We have

If , then we have

but this is impossible if .

If , then define . We have

which implies that and so .

In a -regular expander, the removal of edges can cause at most vertices to be disconnected from the remaining “giant component.” Clearly, it is always possible to disconnect vertices after removing edges, so the reliability of an expander is essentially best possible.

Another way in which expander graphs act similarly to a complete graph is the following. Suppose that, given a graph , we generate a sequence by choosing uniformly at random and then performing a -step random walk. If is a complete graph (in which every vertex has a self-loop), this process uses random bits and generates uniform and independent random vertices. In an expander of constant degree, the process uses only random bits, and the resulting sequence has several of the useful statistical properties of a sequence generated uniformly at random. Especially in the case in which is of the order of , using instead of random bits can be a significant saving in certain application. (Note, in particular, that the sample space has polynomial size instead of quasi-polynomial size.)

Constructions of constant-degree expanders are useful in a variety of applications, from the design of data structures, to the derandomization of algorithms, from efficient cryptographic constructions to being building blocks of more complex quasirandom objects.

There are two families of approaches to the explicit (efficient) construction of bounded-degree expanders. One is via algebraic constructions, typically ones in which the expander is constructed as a Cayley graph of a finite group. Usually these constructions are easy to describe but rather difficult to analyze. The study of such expanders, and of the related group properties, has become a very active research program. There are also combinatorial constructions, which are somewhat more complicated to describe but considerably simpler to analyze.

**4. Mixing time of random walks **

If one takes a random walk in a regular graph that is connected and not bipartite, then, regardless of the starting vertex, the distribution of the -th step of the walk is close to the uniform distribution over the vertices, provided that is large enough. It is always sufficient for to be quadratic in the number of vertices; in some graphs, however, the distribution is near-uniform even when is just poly-logarithmic, and, indeed, the time is at most , and thus it is at most logarithmic in expander graphs.

Among other applications, the study of the “mixing time” (the time that it takes to reach the uniform distribution) of random walks has applications to analyzing the convergence time of certain randomized algorithms.

The design of approximation algorithms for *combinatorial counting* problems, in which one wants to count the number of solutions to a given NP-type problem, can be reduced to the design of *approximately uniform sampling* in which one wants to approximately sample from the set of such solutions. For example, the task of approximately counting the number of perfect matchings can be reduced to the task of sampling almost uniformly from the set of perfect matchings of a given graph. One can design approximate sampling algorithms by starting from an arbitrary solution and then making a series of random local changes. The behavior of the algorithm then corresponds to performing a random walk in the graph that has a vertex for every possible solution and an edge for each local change that the algorithm can choose to make. Although the graph can have an exponential number of vertices in the size of the problem that we want to solve, it is possible for the approximate sampling algorithm to run in polynomial time, provided that a random walk in the graph converges to uniform in time poly-logarithmic in its size.

The study of the mixing time of random walks in graphs is thus a main analysis tool to bound the running time of approximate sampling algorithms (and, via reductions, of approximate counting algorithms).

As a way of showing applications of results proved so far, we will show that, because of Cheeger’s inequality, the mixing time is upper-bounded by , and then we will use the dual of the Leighton-Rao relaxation to show that can be upper-bounded by the congestion of a certain flow problem. We will apply this theory to the analysis of an algorithm that approximates the number of perfect matchings in a given dense bipartite graph.

**5. Linear Systems, Electrical Flows, and Applications **

In the last part of the course, we will turn to connections between graph theory and a different aspect of linear algebra, namely the solution of systems of linear equations. If we have a system of linear equations of the form

we can solve it (or determine that it has no solution) in polynomial time using Gaussian elimination. Sometimes, it is possible to develop faster and more numerically stable algorithms by thinking of the problem has an *optimization*, such as, for example,

for an appropriate choice of norm.

If is positive definite (that is, all the eigenvalues are strictly positive), then another way of turning a linear system into an optimization problem is to consider the problem

The problem is strictly convex, because the Hessian of the function , that is, the matrix of partial second derivatives of , is, at every point, the matrix itself, which we assumed to be positive definite.

The space of solutions is compact, so the problem has a unique minimum, achieved at a point . The gradient of at a point , that is, the vector of partial derivates at , is . The gradient has to be equal to the vector at the optimum , and so we have .

If we want to solve the linear system , and is positive definite, then a possible strategy is to solve the convex optimization problem (1) using gradient descent, or similar local-search algorithms for convex optimization. The running time of such algorithms will be determined by the smallest eigenvalue . In order to deal with matrix having small eigenvalues, one resorts to *preconditioning*, which is a technique that reduces the system to a system in which has a larger smallest eigenvalue. In the interesting special case in which is the Laplacian matrix of an undirected graph, the running time is determined by the expansion of the graph, and preconditioning can be understood in graph-theoretic terms.

(Technically, the Laplacian is not positive definite. What we mean above is that we are interested in solving an equation of the form where is a Laplacian matrix, and is further constrained to be orthogonal to the eigenspace of zero.)

Efficiently solving “Laplacian systems” of the form is closely related to the problem of finding *sparsifiers* of graphs, and we will see nearly linear time algorithms for both problems.

One application of finding solutions to systems of the form is to find *electrical flows* in networks. We will then see how to use fast algorithms for finding electrical flows and turn them into algorithm for the Max Flow problem.

]]>

*In which we review linear algebra prerequisites.*

The following background from linear algebra will be sufficient for the sake of this course: to know what is an eigenvalue and an eigenvector, to know that real symmetric matrices have real eigenvalues and their real eigenvectors are orthogonal, and to know the variational characterization of eigenvalues.

**1. Basic Definitions **

If is a complex number, then we let denote its *conjugate*. Note that a complex number is real if and only if . If is a matrix, then denotes the conjugate transpose of , that is, . If the entries of are real, then , where is the *transpose* of , that is, the matrix such that .

We say that a matrix is *Hermitian* if . In particular, real symmetric matrices are Hermitian.

If are two vectors, then their inner product is defined as

Notice that, by definition, we have and . Note also that, for two matrices , we have , and that for every matrix and every two vectors , , we have

If is a square matrix, is a scalar, is a non-zero vector and we have

then we say that is an *eigenvalue* of and that is *eigenvector* of corresponding to the eigenvalue .

**2. The Spectral Theorem **

We want to prove

Theorem 1 (Spectral Theorem)Let be a symmetric matrix with real-valued entries, then there are real numbers (not necessarily distinct) and orthonormal real vectors , such that is an eigenvector of .

Assuming the fundamental theorem of algebra (that every polynomial has a complex root) and basic properties of the determinant, the cleanest proof of the spectral theorem is to proceed by induction on , and to show that must have a real eigenvalue with a real eigenvector , and to show that maps vectors orthogonal to to vectors orthogonal to . Then one applies the inductive hypothesis to restricted to the -dimensional space of vectors orthogonal to and one recovers the remaining eigenvalues and eigenvectors.

The cleanest way to formalize the above proof is to give all definitions and results in terms of linear operators where is an arbitrary vector space over the reals. This way, however, we would be giving several definitions that we would never use in the future, so, instead, the inductive proof will use a somewhat inelegant change of basis to pass from to an matrix .

We begin by showing that a real symmetric matrix has real eigenvalues and eigenvectors.

Theorem 2If is symmetric, then there is a real eigenvalue and a real eigenvector such that .

We begin by noting that every matrix has a complex eigenvalue.

Lemma 3For every matrix , there is an eigenvalue and an eigenvector such that .

*Proof:* Note that is an eigenvalue for if and only if

which is true if and only if the rows of are not linearly independent, which is true if and only if

Now note that the mapping is a univariate polynomial of degree in , and so it must have a root by the fundamental theorem of algebra.

Next we show that if is real and symmetric, then its eigenvalues are real.

Lemma 4If is Hermitian, then, for every and ,

*Proof:*

Lemma 5If is Hermitian, then all the eigenvalues of are real.

*Proof:* Let be an Hermitian matrix and let be a scalar and be a non-zero vector such that . We will show that , which implies that is a real number.

We note that

and

and by the fact that , we have .

In order to prove Theorem 2, it remains to argue that, for a real eigenvalue of a real symmetric matrix, we can find a real eigenvector.

*Proof:* } Let be a real symmetric matrix, then has a real eigenvalue and a (possibly complex valued) eigenvector , where and are real vectors. We have

from which (recalling that the entries of and the scalar are real) it follows that and that ; since and cannot both be zero, it follows that has a real eigenvector.

We are now ready to prove the spectral theorem

*Proof:* We proceed by induction on . The case is trivial.

Assume that the statement is true for dimension . Let be a real eigenvalue of and be a real eigenvector .

Now we claim that for every vector that is orthogonal to , then is also orthogonal to . Indeed, the inner product of and is

Let be the -dimensional subspace of that contains all the vectors orthogonal to . We want to apply the inductive hypothesis to restricted to ; we cannot literally do that, because the theorem is not stated in terms of arbitrary linear operators over vector spaces, so we will need to do that by fixing an appropriate basis for .

let be a matrix that computes a bijective map from to . (If is an orthonormal basis for , then is just the matrix whose columns are the vectors .) Let also be the matrix such that, for every , . (We can set where is as described above.) We apply the inductive hypothesis to the matrix

and we find eigenvalues and orthonormal eigenvectors for .

For every , we have

and so

Since is orthogonal to , it follows that is also orthogonal to , and so , so we have

and, defining , we have

Finally, we observe that the vectors are orthogonal. By construction, is orthogonal to , and, for every , we have that

We have not verified that the vectors have norm 1 (which is true), but we can scale them to have norm 1 if not.

**3. Variational Characterization of Eigenvalues **

We conclude these notes with the variational characterization of eigenvalues for real symmetric matrices.

Theorem 6Let be a symmetric matrix, and be the eigenvalues of in non-increasing order. Then

The quantity is called the *Rayleigh quotient* of with respect to , and we will denote it by .

*Proof:* Let be orthonormal eigenvectors of the eigenvalues , as promised by the spectral theorem. Consider the -dimensional space spanned by . For every vector in such a space, the numerator of the Rayleigh quotient is

The denominator is clearly , and so . This shows that

For the other direction, let be any -dimensional space: we will show that must contain a vector of Rayleigh quotient . Let be the span of ; since has dimension and has dimension , they must have some non-zero vector in common. Let be one such vector, and let us write . The numerator of the Rayleigh quotient of is

and the denominator is , so .

We have the following easy consequence.

Fact 7If is the smallest eigenvalue of a real symmetric matrix , then

Furthermore, every minimizer is an eigenvector of .

*Proof:* The identity is the case of the previous theorem. For the furthermore part, let be the list of eigenvalues of in non-decreasing order, and be corresponding eigenvectors. If is any vector, then

If , then for every such that , that is, is a linear combination of eigenvectors of , and hence it is an eigenvector of .

Fact 8If is the largest eigenvalue of a real symmetric matrix , then

Furthermore, every maximizer is an eigenvector of .

*Proof:* Apply Fact 7 to the matrix .

Fact 9If is the smallest eigenvalue of a real symmetric matrix , and is an eigenvector of , then

Furthermore, every minimizer is an eigenvector of .

*Proof:* A more conceptual proof would be to consider the restriction of to the space orthogonal to , and then apply Fact 7 to such a linear operator. But, since we have not developed the theory for general linear operators, we would need to explicitly reduce to an -dimensional case via a projection operator as in the proof of the spectral theorem.

Instead, we will give a more hands-on proof. Let be the list of eigenvalues of , with multiplicities, and be orthonormal vectors as given by the spectral theorem. We may assume that , possibly by changing the orthonormal basis of the eigenspace of . For every vector orthogonal to , its Rayleigh quotient is

and the minimum is achieved by vectors such that for every , that is, for vectors which are linear combinations of the eigenvectors of , and so every minimizer is an eigenvector of .

]]>

The Stanford course had two main components: (1) spectral algorithms for sparsest cut, and comparisons with LP and SDP based methods, and (2) properties and constructions of expanders.

I will use the additional time to talk a bit more about spectral algorithms, including clustering algorithms, and about constructions of expanders, and to add a third part about electrical networks, sparsification, and max flow.

Lecture notes will be posted here after each lecture.

In some more detail, the course will start with a review of linear algebra and a proof of basic spectral graph theory facts, such as the multiplicity of 0 as an eigenvalue of the Laplacian being the same as the number of connected components of a graph.

Then we will introduce expansion and conductance, and prove Cheeger’s inequality. We will do so in the language of approximation algorithms, and we will see how the analysis of Fiedler’s algorithm given by Cheeger’s inequality compares to the Leighton-Rao analysis of the LP relaxation and the Arora-Rao-Vazirani analysis of the SDP relaxation. Then we will prove several variants of Cheeger’s inequality, interpreting them as analyses of spectral algorithms for clustering and max cut.

In the second part of the course, we will see properties of expanders and combinatorial and algebraic constructions of expanders. We will talk about the theory that gives eigenvalues and eigenvectors of Abelian Cayley graphs, the zig-zag graph product, and the Margulis-Gabber-Galil construction. I would also like to talk about the expansion of random graphs, and to explain how one gets expander constructions from Selberg’s “3/16 theorem,” although I am not sure if there will be time for that.

The first two parts will be tied together by looking at the MCMC algorithm to approximate the number of perfect matchings in a dense bipartite graph. The analysis of the algorithm depends on the mixing time of a certain exponentially big graph, the mixing time will be determined (as shown in a previous lecture on properties of expanders) by the eigenvalue gap, the eigenvalue gap will be determined (as shown by Cheeger’s inequality) by the conductance, and the conductance can be bounded by constructing certain multicommodity flows (as shown in the analysis of the Leighton-Rao algorithms).

In the third part, we will talk about electrical networks, effective resistance and electrical flows, see how to get sparsifiers using effective resistance, a sketch of how to salve Laplacian equations in nearly linear time, and how to approximate max flow using electrical flows.

]]>

It occurred to me that this is the *complement* of Thanksgiving, in which you get together to *remember* the *good things* that happened during the year.

I don’t think there is anything else left to say about the difference between Japanese and American culture.

Interestingly, there are a couple more possibilities. One could *remember* the *bad things* that happened during the year, as in the airing of grievances during Festivus.

Finally, one could *forget* the *good things*, which is very much the Italian attitude.

**Edited to add**: I don’t know how I forgot (ah!) but there is a famous Neapolitan folk song that goes

Chi ha avuto, ha avuto, ha avuto

Chi ha dato, ha dato, ha dato,

Scurdammuce ‘o passato,

simm’e Napule, paisa’

which is roughly

Who has received, has received

Who has given, has given,

Let’s forget the past

We are [all] from Naples

]]>

If someone had told me last week: “a quasi-polynomial time algorithm has been found for a major open problem for which only a slightly subexponential algorithm was known before,” I would have immediately thought *Unique Games*!

Before Babai’s announcement, Graph Isomorphism had certain interesting properties in common with problems such as Factoring, Discrete Log, and Approximate Closest Vector (for approximation ratios of the order of sqrt (n) or more): no polynomial time algorithm is known, non-trivial algorithms that are much faster than brute force are known, and NP-completeness is not possible because the problem belongs to either or .

But there is an important difference: there are simple distributions of inputs on which Factoring, Discrete Log, and Closest Vector approximation are believed to be hard on average, and if one proposes an efficiently implementable algorithms for such problems, it can be immediately shown that it does not work. (Or, if it works, it’s already a breakthrough even without a rigorous analysis.)

In the case of Graph Isomorphism, however, it is easy to come up with simple algorithms for which it is very difficult to find counterexamples, and there are algorithms that are rigorously proved to work on certain distributions of random graphs. Now we know that there are in fact no hard instances at all, but, even before, if we believed that Graph Isomorphism was hard, we had to believe that the hard instances were rare and strange, rather than common.

It is also worth pointing out that, using Levin’s theory of average-case complexity, one can show that if any problem at all in NP is hard under any samplable distribution, then for *every* NP-complete problem we can find a samplable distribution under which the problem is hard. And, in “practice,” natural NP-complete problems do have simple distributions that seem to generate hard instances.

What about Small-set Expansion, Unique Games, and Unique-Games-Hard problems not known to be NP-hard, like -approximation of Sparsest Cut? We don’t know of any distribution for which it is plausible to conjecture that such problems are hard, and we have algorithms (Lasserre relaxations of constant degree) with no known counterexample. Many simple distributions of instances are rigorously solved by known algorithms. So, if we want to believe the Unique Games conjecture, we have to believe that there are hard instances, but they are rare and strange.

I am sure that it is possible, under standard assumptions, to construct an artificial problem L in NP that is in average-case-P according to Levin’s definition but not in P. Such a problem would not be polynomial time solvable, but it would be easy to solve on average under any samplable distribution and, intuitively, it would be a problem that is hard even though hard instances are rare and strage.

But can a natural problem in NP exhibit this behavior? Now that Graph Isomorphism is not a plausible example any more, I am inclined to believe (until the next surprise) that no natural problem has this behavior, and my guess concerning the Unique Games conjectures is going to be that it is false (or “morally false” in the sense that a quasipolynomial time algorithm exists) until someone comes up with a distribution of Unique Games instances that are plausibly hard on average and that, in particular, exhibit integrality gaps for Lasserre relaxations (even just experimentally).

]]>

Meanwhile, if you have any gossip on the proof, then, by any means, go ahead and share it in the comments.

]]>

But, in Italy, Fabrizi is famous for being one of the giants of the first generation of Italian-style comedy, from the 1950s and 1960s. My favorite movies of his are those in which he acts as a straight man for Totò, and my absolute favorite is *Guardie e Ladri*, which never had an American release.

For those who understand Italian, it’s possible to find the whole movie on youtube. Here is one iconic scene.

]]>

Unfortunately there is no hotel in downtown Berkeley that is able to accommodate FOCS. The Shattuck hotel almost but not quite is. (There are two conference rooms, but they are of very different size, and the space to hang out for coffee breaks is much too small for 200+ people, and it’s outdoors, which is potentially bad because rain in October is unlikely but not impossible in Berkeley.)

This leaves us with the Doubletree hotel in the Berkeley Marina, which has some advantages, such as views of the bay and good facilities, and some disadvantages, such as the isolated location and the high prices. The location also forces us to provide lunches, because it would be inconvenient for people to drive to lunch places and then drive back during the lunch break. Being well aware of this, the hotel charges extortionate fees for food.

This is to say that, planning for FOCS 2017, there is nothing much different that we can do, although there are lots of little details that we can adjust, and it would be great to know how people’s experience was.

For example, did the block of discounted hotel rooms run out too soon? Would you have liked to have received something else with your registration than just the badge? If so, what? (So far, I have heard suggestions for FOCS-branded hats, t-shirts, and teddy bears.) Wasn’t it awesome to have a full bar at the business meeting? Why did nobody try the soups at lunch? The soups were delicious!

]]>