CS359G Lecture 7: Computing Eigenvectors

In which we analyze a nearly-linear time algorithm for finding an approximate eigenvector for the second eigenvalue of a graph adjacency matrix, to be used in the spectral partitioning algorithm.

In past lectures, we showed that, if ${G=(V,E)}$ is a ${d}$-regular graph, and ${M}$ is its normalized adjacency matrix with eigenvalues ${1=\lambda_1 \geq \lambda_2 \ldots \geq \lambda_n}$, given an eigenvector of ${\lambda_2}$, the algorithm SpectralPartition finds, in nearly-linear time ${O(|E| + |V|\log |V|)}$, a cut ${(S,V-S)}$ such that ${h(S) \leq 2\sqrt{h(G)}}$.

More generally, if, instead of being given an eigenvector ${{\bf x}}$ such that ${M{\bf x} = \lambda_2 {\bf x}}$, we are given a vector ${{\bf x} \perp {\bf 1}}$ such that ${{\bf x}^T M {\bf x} \geq (\lambda_2 - \epsilon) {\bf x}^T{\bf x}}$, then the algorithm finds a cut such that ${h(S) \leq \sqrt{4h(G) + 2\epsilon}}$. In this lecture we describe and analyze an algorithm that computes such a vector using ${O((|V|+|E|)\cdot \frac 1\epsilon \cdot \log \frac {|V|}{\epsilon})}$ arithmetic operations.

A symmetric matrix is positive semi-definite (abbreviated PSD) if all its eigenvalues are nonnegative. We begin by describing an algorithm that approximates the largest eigenvalue of a given symmetric PSD matrix. This might not seem to help very much because the adjacency matrix of a graph is not PSD, and because we want to compute the second largest, not the largest, eigenvalue. We will see, however, that the algorithm is easily modified to approximate the second eigenvalue of a PSD matrix (if an eigenvector of the first eigenvalue is known), and that the adjacency matrix of a graph can easily be modified to be PSD.

CS359G Lecture 6: The Spectrum of the Cycle and of the Hypercube

In which we talk about the spectrum of Cayley graphs of abelian groups, we compute the eigenvalues and eigenvectors of the cycle and of the hypercube, and we verify the tightness of the Cheeger inequalities and of the analysis of spectral partitioning

In this lecture we will prove the following results:

1. The dimension-${d}$ hypercube ${H_d}$ has ${\lambda_2 = 1- \frac 2d}$ and ${h(H_d) = \frac 1d}$, giving an infinite family of graphs for which ${\frac{1-\lambda_2}{2} = h(G)}$, showing that the first Cheeger inequality is exactly tight.
2. The ${n}$-cycle ${C_n}$ has ${\lambda_2 = 1 - O(n^{-2})}$, and ${h(C_n) \geq \frac 2n}$, giving an infinite family of graphs for which ${h(G) = \Omega(\sqrt{1-\lambda_2})}$, showing that the second Cheeger inequality is tight up to a constant.
3. There is an eigenvector of the second eigenvalue of the hypercube ${H_d}$, such that the SpectralPartitioning algorithm, given such a vector, outputs a cut ${(S,V-S)}$ of expansion ${h(S) = \Omega(1/\sqrt{d})}$, showing that the analysis of the SpectralPartitioning algorithm is tight up to a constant.

CS359G Lecture 5: Characters of Abelian Groups

In which we introduce the theory of characters of finite abelian groups, which we will use to compute eigenvalues and eigenvectors of graphs such as the cycle and the hypercube

In the past lectures we have established the Cheeger inequalities

$\displaystyle \frac{1-\lambda_2}{2} \leq h(G) \leq \sqrt{2 \cdot (1-\lambda_2)}$

and the fact that the SpectralPartitioning algorithm, when given an eigenvector of ${\lambda_2}$, finds a cut ${(S,V-S)}$ such that ${h(S) \leq 2\sqrt{h(G)}}$. In the next lecture we will show that all such results are tight, up to constants, by proving that

• The dimension-${d}$ hypercube ${H_d}$ has ${\lambda_2 = 1- \frac 2d}$ and ${h(H_d) = \frac 1d}$, giving an infinite family of graphs for which ${\frac{1-\lambda_2}{2} = h(G)}$, showing that the first Cheeger inequality is exactly tight.
• The ${n}$-cycle ${C_n}$ has ${\lambda_2 = 1 - O(n^{-2})}$, and ${h(C_n) = \frac 2n}$, giving an infinite family of graphs for which ${h(G) = \Omega(\sqrt{1-\lambda_2})}$, showing that the second Cheeger inequality is tight up to a constant.
• There is an eigenvector of the 2nd eigenvalue of the hypercube ${H_d}$, such that the SpectralPartitioning algorithm, given such a vector, outputs a cut ${(S,V-S)}$ of expansion ${h(S) = \Omega(1/\sqrt{d})}$, showing that the analysis of the SpectralPartitioning algorithm is tight up to a constant.

In this lecture we will develop some theoretical machinery to find the eigenvalues and eigenvectors of Cayley graphs of finite Abelian groups, a class of graphs that includes the cycle and the hypercube, among several other interesting examples. This theory will also be useful later, as a starting point to talk about algebraic constructions of expanders.

For readers familiar with the Fourier analysis of Boolean functions, or the discrete Fourier analysis of functions ${f: {\mathbb Z}/N{\mathbb Z} \rightarrow {\mathbb C}}$, or the standard Fourier analysis of periodic real functions, this theory will give a more general, and hopefully interesting, way to look at what they already know.

CS359G Lecture 4: Spectral Partitioning

In which we prove the difficult direction of Cheeger’s inequality.

As in the past lectures, consider an undirected ${d}$-regular graph ${G=(V,E)}$, call ${A}$ its adjacency matrix, and ${M: = \frac 1d A}$ its scaled adjacency matrix. Let ${\lambda_1 \geq \cdots \geq \lambda_n}$ be the eigenvalues of ${M}$, with multiplicities, in non-increasing order. We have been studying the edge expansion of a graph, which is the minimum of ${h(S)}$ over all non-trivial cuts ${(S,V-S)}$ of the vertex set (a cut is trivial if ${S=\emptyset}$ or ${S=V}$), where the expansion ${h(S)}$ of a cut is

$\displaystyle h(S) := \frac{Edges(S,V-S)}{d\cdot \min \{ |S|, |V-S| \} }$

We have also been studying the (uniform) sparsest cut problem, which is the problem of finding the non-trivial cut that minimizes ${\phi(S)}$, where the sparsisty ${\phi(S)}$ of a cut is

$\displaystyle \phi(S) := \frac{Edges(S,V-S)}{\frac dn |S| \cdot |V-S|}$

We are proving Cheeger’s inequalities:

$\displaystyle \frac {1-\lambda_2}{2} \leq h(G) \leq \sqrt{2 \cdot (1-\lambda_2) } \ \ \ \ \ (1)$

and we established the left-hand side inequality in the previous lecture, showing that the quantity ${1-\lambda_2}$ can be seen as the optimum of a continuous relaxation of ${\phi(G)}$, so that ${1-\lambda_2 \leq \phi(g)}$, and ${\phi(G) \leq 2 h(G)}$ follows by the definition.

Today we prove the more difficult, and interesting, direction. The proof will be constructive and algorithmic. The proof can be seen as an analysis of the following algorithm.

Algorithm: SpectralPartitioning

• Input: graph ${G=(V,E)}$ and vector ${{\bf x} \in {\mathbb R}^V}$
• Sort the vertices of ${V}$ in non-decreasing order of values of entries in ${{\bf x}}$, that is let ${V=\{ v_1,\ldots,v_n\}}$ where ${x_{v_1} \leq x_{v_2} \leq \ldots x_{v_n}}$
• Let ${i\in \{1,\ldots,n-1\}}$ be such that ${h(\{ v_1,\ldots, v_i \} )}$ is minimal
• Output ${S= \{ v_1,\ldots, v_i \}}$

We note that the algorithm can be implemented to run in time ${O(|V|+|E|)}$, assuming arithmetic operations and comparisons take constant time, because once we have computed ${h(\{ v_1,\ldots,v_{i} \})}$ it only takes time ${O(degree(v_{i+1}))}$ to compute ${h(\{ v_1,\ldots,v_{i+1} \})}$.

We have the following analysis of the quality of the solution:

Lemma 1 (Analysis of Spectral Partitioning) Let ${G=(V,E)}$ be a d-regular graph, ${{\bf x}\in {\mathbb R}^V}$ be a vector such that ${{\bf x} \perp {\bf 1}}$, let ${M}$ be the normalized adjacency matrix of ${G}$, define

$\displaystyle \delta:= \frac{ \sum_{i,j} M_{i,j} |x_i - x_j |^2}{\frac 1n \sum_{i,j} |x_i - x_j |^2}$

and let ${S}$ be the output of algorithm SpectralPartitioning on input ${G}$ and ${x}$. Then

$\displaystyle h(S) \leq \sqrt{2\delta}$

Remark 1 If we apply the lemma to the case in which ${{\bf x}}$ is an eigenvector of ${\lambda_2}$, then ${\delta = 1-\lambda_2}$, and so we have

$\displaystyle h(G) \leq h(S) \leq \sqrt{2 \cdot (1-\lambda_2)}$

which is the difficult direction of Cheeger’s inequalities.

Remark 2 If we run the SpectralPartitioning algorithm with the eigenvector ${{\bf x}}$ of the second eigenvalue ${\lambda_2}$, we find a set ${S}$ whose expansion is

$\displaystyle h(S) \leq \sqrt{2\cdot (1-\lambda_2) } \leq 2 \sqrt{h(G)}$

Even though this doesn’t give a constant-factor approximation to the edge expansion, it gives a very efficient, and non-trivial, approximation.

As we will see in a later lecture, there is a nearly linear time algorithm that finds a vector ${x}$ for which the expression ${\delta}$ in the lemma is very close to ${1-\lambda_2}$, so, overall, for any graph ${G}$ we can find a cut of expansion ${O(\sqrt {h(G)})}$ in nearly linear time.

CS261 Lecture 7: Rounding Linear Programs

In which we show how to use linear programming to approximate the vertex cover problem.

1. Linear Programming Relaxations

An integer linear program (abbreviated ILP) is a linear program (abbreviated LP) with the additional constraints that the variables must take integer values. For example, the following is an ILP:

$\displaystyle \begin{array}{ll} {\rm maximize} & x_1 - x_2 + 2x_3 \\ {\rm subject\ to} \\ & x_1- x_2 \leq 1\\ & x_2 + x_3 \leq 2\\ & x_1 \in {\mathbb N}\\ & x_2 \in {\mathbb N}\\ & x_3 \in {\mathbb N} \end{array} \ \ \ \ \ (1)$

Where ${{\mathbb N} = \{ 0,1,2,\ldots \}}$ is the set of natural numbers. The advantage of ILPs is that they are a very expressive language to formulate optimization problems, and they can capture in a natural and direct way a large number of combinatorial optimization problems. The disadvantage of ILPs is that they are a very expressive language to formulate combinatorial optimization problems, and finding optimal solutions for ILPs is NP-hard.

If we are interested in designing a polynomial time algorithm (exact or approximate) for a combinatorial optimization problem, formulating the combinatorial optimization problem as an ILP is useful as a first step in the following methodology (the discussion assumes that we are working with a minimization problem):

• Formulate the combinatorial optimization problem as an ILP;
• Derive a LP from the ILP by removing the constraint that the variables have to take integer value. The resulting LP is called a “relaxation” of the original problem. Note that in the LP we are minimizing the same objective function over a larger set of solutions, so ${opt(LP) \leq opt(ILP)}$;
• Solve the LP optimally using an efficient algorithm for linear programming;
• If the optimal LP solution has integer values, then it is a solution for the ILP of cost ${opt(LP) \leq opt (ILP)}$, and so we have found an optimal solution for the ILP and hence an optimal solution for our combinatorial optimization problem;
• If the optimal LP solution ${x^*}$ has fractional values, but we have a rounding procedure that transforms ${x^*}$ into an integral solution ${x'}$ such that ${cost(x') \leq c\cdot cost(x^*)}$ for some constant ${c}$, then we are able to find a solution to the ILP of cost ${\leq c\cdot opt(LP) \leq c\cdot opt(ILP)}$, and so we have a ${c}$-approximate algorithm for our combinatorial optimization problem.

In this lecture and in the next one we will see how to round fractional solutions of relaxations of the Vertex Cover and the Set Cover problem, and so we will be able to derive new approximation algorithms for Vertex Cover and Set Cover based on linear programming.

CS261 Lecture 6: Duality in Linear Programming

In which we introduce the theory of duality in linear programming.

1. The Dual of Linear Program

Suppose that we have the following linear program in maximization standard form:

$\displaystyle \begin{array}{ll} {\rm maximize} & x_1 + 2x_2 + x_3 + x_4\\ {\rm subject\ to}\\ & x_1 +2 x_2 + x_3 \leq 2\\ & x_2 + x_4\leq 1\\ & x_1 + 2x_3\leq 1\\ & x_1 \geq 0\\ & x_2 \geq 0\\ &x_3 \geq 0 \end{array} \ \ \ \ \ (1)$

and that an LP-solver has found for us the solution ${x_1:=1}$, ${x_2:=\frac 12}$, ${x_3:=0}$, ${x_4:= \frac 12}$ of cost ${2.5}$. How can we convince ourselves, or another user, that the solution is indeed optimal, without having to trace the steps of the computation of the algorithm?

Observe that if we have two valid inequalities

$\displaystyle a\leq b \ {\rm and } \ c \leq d$

then we can deduce that the inequality

$\displaystyle a+c \leq b+d$

(derived by “summing the left hand sides and the right hand sides” of our original inequalities) is also true. In fact, we can also scale the inequalities by a positive multiplicative factor before adding them up, so for every non-negative values ${y_1,y_2 \geq 0}$ we also have

$\displaystyle y_1 a + y_2 c \leq y_1 b + y_2 d$

Going back to our linear program (1), we see that if we scale the first inequality by ${\frac 12}$, add the second inequality, and then add the third inequality scaled by ${\frac 12}$, we get that, for every ${(x_1,x_2,x_3,x_4)}$ that is feasible for (1),

$\displaystyle x_1 + 2x_2 + 1.5 x_3 + x_4 \leq 2.5$

And so, for every feasible ${(x_1,x_2,x_3,x_4)}$, its cost is

$\displaystyle x_1 + 2x_2 + x_3 + x_4 \leq x_1 + 2x_2 + 1.5 x_3 + x_4 \leq 2.5$

meaning that a solution of cost ${2.5}$ is indeed optimal.

In general, how do we find a good choice of scaling factors for the inequalities, and what kind of upper bounds can we prove to the optimum?

CS261 Lecture 5: Linear Programming

In which we introduce linear programming.

1. Linear Programming

A linear program is an optimization problem in which we have a collection of variables, which can take real values, and we want to find an assignment of values to the variables that satisfies a given collection of linear inequalities and that maximizes or minimizes a given linear function.

(The term programming in linear programming, is not used as in computer programming, but as in, e.g., tv programming, to mean planning.)

For example, the following is a linear program.

$\displaystyle \begin{array}{ll} {\rm maximize} & x_1 + x_2\\ {\rm subject\ to}\\ & x_1 + 2x_2 \leq 1\\ & 2x_1 + x_2\leq 1\\ & x_1 \geq 0\\ & x_2 \geq 0 \end{array} \ \ \ \ \ (1)$

The linear function that we want to optimize (${x_1+x_2}$ in the above example) is called the objective function. A feasible solution is an assignment of values to the variables that satisfies the inequalities. The value that the objective function gives to an assignment is called the cost of the assignment. For example, ${x_1 := \frac 13}$ and ${x_2 := \frac 13}$ is a feasible solution, of cost ${\frac 23}$. Note that if ${x_1,x_2}$ are values that satisfy the inequalities, then, by summing the first two inequalities, we see that

$\displaystyle 3x_1 + 3 x_2 \leq 2$

that is,

$\displaystyle x_1 + x_2 \leq \frac 23$

and so no feasible solution has cost higher than ${\frac 23}$, so the solution ${x_1 := \frac 13}$, ${x_2 := \frac 13}$ is optimal. As we will see in the next lecture, this trick of summing inequalities to verify the optimality of a solution is part of the very general theory of duality of linear programming.

Linear programming is a rather different optimization problem from the ones we have studied so far. Optimization problems such as Vertex Cover, Set Cover, Steiner Tree and TSP are such that, for a given input, there is only a finite number of possible solutions, so it is always trivial to solve the problem in finite time. The number of solutions, however, is typically exponentially big in the size of the input and so, in order to be able to solve the problem on reasonably large inputs, we look for polynomial-time algorithms. In linear programming, however, each variable can take an infinite number of possible values, so it is not even clear that the problem is solvable in finite time.

As we will see, it is indeed possible to solve linear programming problems in finite time, and there are in fact, polynomial time algorithms and efficient algorithms that solve linear programs optimally.

There are at least two reasons why we are going to study linear programming in a course devoted to combinatorial optimization:

• Efficient linear programming solvers are often used as part of the toolkit to design exact or approximate algorithms for combinatorial problems.
• The powerful theory of duality of linear programming, that we will describe in the next lecture, is a very useful mathematical theory to reason about algorithms, including purely combinatorial algorithms for combinatorial problems that seemingly have no connection with continuous optimization.

CS261 Lecture 4: TSP and Set Cover

In which we describe a ${1.5}$-approximate algorithm for the Metric TSP, we introduce the Set Cover problem, observe that it can be seen as a more general version of the Vertex Cover problem, and we devise a logarithmic-factor approximation algorithm.

1. Better Approximation of the Traveling Salesman Problem

In the last lecture we discussed equivalent formulations of the Traveling Salesman problem, and noted that Metric TSP-R can also be seen as the following problem: given a set of points ${X}$ and a symmetric distance function ${d: X \times X \rightarrow {\mathbb R}_{\geq 0}}$ that satisfies the triangle inequality, find a multi-set of edges such that ${(X,E)}$ is a connected multi-graph in which every vertex has even degree and such that ${\sum_{(u,v)\in E} d(u,v)}$ is minimized.

Our idea will be to construct ${E}$ by starting from a minimum-cost spanning tree of ${X}$, and then adding edges so that every vertex becomes of even degree.

But how do we choose which edges to add to ${T}$?

CS359G Lecture 3: Cheeger’s inequality

In which we prove the easy case of Cheeger’s inequality.

1. Expansion and The Second Eigenvalue

Let ${G=(V,E)}$ be an undirected ${d}$-regular graph, ${A}$ its adjacency matrix, ${M = \frac 1d \cdot A}$ its normalized adjacency matrix, and ${1=\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n}$ be the eigenvalues of ${M}$.

Recall that we defined the edge expansion of a cut ${(S,V-S)}$ of the vertices of ${G}$ as

$\displaystyle h(S) := \frac {E(S,V-S)}{d \cdot \min \{ |S|, |V-S| \} }$

and that the edge expansion of ${G}$ is ${h(G) := \min_{S\subseteq V} h(S)}$.

We also defined the related notion of the sparsity of a cut ${(S,V-S)}$ as

$\displaystyle \phi(S) := \frac {E(S,V-S)}{ \frac dn \cdot |S| \cdot |V-S| }$

and ${\phi(G) := \min_S \phi(S)}$; the sparsest cut problem is to find a cut of minimal sparsity.

Recall also that in the last lecture we proved that ${\lambda_2 = 1}$ if and only if ${G}$ is disconnected. This is equivalent to saying that ${1-\lambda_2 = 0 }$ if and only if ${h(G)=0}$. In this lecture and the next we will see that this statement admits an approximate version that, qualitatively, says that ${1-\lambda_2}$ is small if and only if ${h(G)}$ is small. Quantitatively, we have

Theorem 1 (Cheeger’s Inequalities)

$\displaystyle \frac{1-\lambda_2}2 \leq h(G) \leq \sqrt{2 \cdot (1-\lambda_2) } \ \ \ \ \ (1)$