CS261 Lecture 15: the LP of Max Flow

In which we look at the linear programming formulation of the maximum flow problem, construct its dual, and find a randomized-rounding proof of the max flow – min cut theorem.

In the first part of the course, we designed approximation algorithms “by hand,” following our combinatorial intuition about the problems. Then we looked at linear programming relaxations of the problems we worked on, and we saw that approximation algorithms for those problems could also be derived by rounding a linear programming solution. We also saw that our algorithms could be interpreted as constructing, at the same time, an integral primal solution and a feasible solution for the dual problem.

Now that we have developed exact combinatorial algorithms for a few problems (maximum flow, minimum s-t cut, global min cut, maximum matching and minimum vertex cover in bipartite graphs), we are going to look at linear programming relaxations of those problems, and use them to gain a deeper understanding of the problems and of our algorithms.

We start with the maximum flow and the minimum cut problems.

Continue reading

CS359G Lecture 8: the Leighton-Rao Relaxation

In which we introduce the Leighton-Rao relaxation of sparsest cut.

Let {G=(V,E)} be an undirected graph. Unlike past lectures, we will not need to assume that {G} is regular. We are interested in finding a sparsest cut in {G}, where the sparsity of a non-trivial bipartition {(S,V-S)} of the vertices is

\displaystyle  \phi_G(S) := \frac{\frac{1}{|E|} \cdot Edges(S,V-S) } {\frac {2}{V^2} \cdot |S| \cdot |V-S| }

which is the ratio between the fraction of edges that are cut by {(S,V-S)} and the fraction of pairs of vertices that are disconnected by the removal of those edges.

Another way to write the sparsity of a cut is as

\displaystyle  \phi_G(S) := \frac {|V|^2}{2|E|} \cdot \frac{\sum_{i,j} A_{i,j} |1_S(i) - 1_S(j)| } {\sum_{i,j} |1_S(i) - 1_S(j) | }

where {A} is the adjacency matrix of {G} and {1_S(\cdot)} is the indicator function of the set {S}.

The observation that led us to see {1-\lambda_2} as the optimum of a continuous relaxation of {\phi} was to observe that {|1_S(i)-1_S(j)| = |1_S(i) - 1_S(j)|^2}, and then relax the problem by allowing arbitrary functions {x: V \rightarrow {\mathbb R}} instead of indicator functions {1_S : V \rightarrow \{ 0,1 \}}.

The Leighton-Rao relaxation of sparsest cut is obtained using, instead, the following observation: if, for a set {S}, we define {d_S(i,j) := |1_S(i) - 1_S(j)|}, then {d_S(\cdot,\cdot)} defines a semi-metric over the set {V}, because {d_S} is symmetric, {d_S(i,i)=0}, and the triangle inequality holds. So we could think about allowing arbitrary semi-metrics in the expression for {\phi}, and define

\displaystyle  LR(G):= \min_{\begin{array}{l} d:V\times V \rightarrow {\mathbb R}\\ d \ \mbox{semi-metric} \end{array}} \frac {|V|^2}{2|E|} \cdot \frac{\sum_{i,j} A_{i,j} d(i,j) }{\sum_{i,j} d(i,j)} \ \ \ \ \ (1)

This might seem like such a broad relaxation that there could be graphs on which {LR(G)} bears no connection to {\phi(G)}. Instead, we will prove the fairly good estimate

\displaystyle   LR(G) \leq \phi(G) \leq O(\log |V|) \cdot LR(G) \ \ \ \ \ (2)

Furthermore, we will show that {LR(G)}, and an optimal solution {d(\cdot,\cdot)} can be computed in polynomial time, and the second inequality above has a constructive proof, from which we derive a polynomial time {O(\log |V|)}-approximate algorithm for sparsest cut.

1. Formulating the Leighton-Rao Relaxation as a Linear Program

The value {LR(G)} and an optimal {d(\cdot,\cdot)} can be computed in polynomial time by solving the following linear program

\begin{array}{lll}{\rm minimize} &  \sum_{i,j} A_{i,j} d_{i,j}  \\ {\rm subject\ to} \\ & \sum_{i,j} d_{i,j} = \frac {|V|^2}{2|E|}  \\ & d_{i,k} \leq d_{i,j} + d_{j,k} & \forall i,j,k \in V\\ & d_{i,j} \geq 0 & \forall i \in V\end{array} \ \ \ \ \ (3)

that has a variable {d_{i,j}} for every unordered pair of distinct vertices {i,j}. Clearly, every solution to the linear program (3) is also a solution to the right-hand side of the definition (1) of the Leighton-Rao parameter, with the same cost. Also every semi-metric can be normalized so that {\sum_{i,j} d(i,j) = |V|^2/2|E|} by multiplying every distance by a fixed constant, and the normalization does not change the value of the right-hand side of (1); after the normalization, the semimetric is a feasible solution to the linear program (3), with the same cost.

In the rest of this lecture and the next, we will show how to round a solution to (3) into a cut, achieving the logarithmic approximation promised in (2).

2. An L1 Relaxation of Sparsest Cut

In the Leighton-Rao relaxation, we relax distance functions of the form {d(i,j) = |1_S(i) - 1_S(j)|} to completely arbitrary distance functions. Let us consider an intermediate relaxation, in which we allow distance functions that can be realized by an embedding of the vertices in an {\ell_1} space.

Recall that, for a vector {{\bf x} \in {\mathbb R}^n}, its {\ell_1} norm is defined as {|| {\bf x}||_1 := \sum_i |x_i|}, and that this norm makes {{\mathbb R}^n} into a metric space with the {\ell_1} distance function

\displaystyle  || {\bf x} - {\bf y}||_1 = \sum_i |x_i - y_i |

The distance function {d(i,j) = |1_S(i) - 1_S(j)|} is an example of a distance function that can be realized by mapping each vertex to a real vector, and then defining the distance between two vertices as the {\ell_1} norm of the respective vectors. Of course it is an extremely restrictive special case, in which the dimension of the vectors is one, and in which every vertex is actually mapping to either zero or one. Let us consider the relaxation of sparsest cut to arbitrary {\ell_1} mappings, and define

\displaystyle  \phi'(G) := \inf_{m, f: V\rightarrow {\mathbb R}^m} \frac{|V|^2}{2|E|} \cdot \frac {\sum_{i,j} A_{i,j} || f(i) - f(j) ||_1 } {\sum_{i,j} || f(i) - f(j) ||_1 }

This may seem like another very broad relaxation of sparsest cut, whose optimum might bear no correlation with the sparsest cut optimum. The following theorem shows that this is not the case.

Theorem 1 For every graph {G}, {\phi(G) = \phi'(G)}.

Furthermore, there is a polynomial time algorithm that, given a mapping {f: V\rightarrow {\mathbb R}^m}, finds a cut {S} such that

\displaystyle  \frac {\sum_{u,v} A_{u,v} |1_S(u) - 1_S(v) |} {\sum_{u,v} |1_S(u) - 1_S(v)| }\leq \frac {\sum_{u,v} A_{u,v} || f(u) - f(v) ||_1 } {\sum_{u,v} || f(u) - f(v) ||_1 } \ \ \ \ \ (4)

Proof: We use ideas that have already come up in the proof the difficult direction of Cheeger’s inequality. First, we note that for every nonnegative reals {a_1,\ldots,a_m} and positive reals {b_1,\ldots,b_m} we have

\displaystyle   \frac {a_1 + \cdots a_m}{b_1 + \cdots b_m} \geq \min_i \frac{a_i}{b_i} \ \ \ \ \ (5)

as can be seen by noting that

\displaystyle  \sum_j a_j = \sum_j b_j \cdot \frac{a_j}{b_j} \geq \left( \min_i \frac{a_i}{b_i} \right) \cdot \sum_j b_j

Let {f_i(v)} be the {i}-th coordinate of the vector {f(v)}, thus {f(v) = (f_1(v),\ldots,f_m(v))}. Then we can decompose the right-hand side of (4) by coordinates, and write

\displaystyle  \frac {\sum_{u,v} A_{u,v} || f(u) - f(v) ||_1 } {\sum_{u,v} || f(u) - f(v) ||_1 }

\displaystyle  = \frac {\sum_i \sum_{u,v} A_{u,v} | f_i(u) - f_i(v) | } {\sum_i \sum_{u,v} | f_i(u) - f_i(v) | }

\displaystyle  \geq \min_i \frac {\sum_{u,v} A_{u,v} | f_i(u) - f_i(v) | } { \sum_{u,v} | f_i(u) - f_i(v) | }

This already shows that, in the definition of {\phi'}, we can map, with no loss of generality, to 1-dimensional {\ell_1} spaces.

Let {i^*} be the coordinate that achieves the minimum above. Because the cost function is invariant under the shifts and scalings (that is, the cost of a function {x\rightarrow f(x)} is the same as the cost of {x\rightarrow af(x) + b} for every two constants {a\neq 0} and {b}) there is a function {g:V\rightarrow {\mathbb R}} such that {g} has the same cost function as {f_{i*}} and it has a unit-length range {\max_v g(v) -\min_v g(v) = 1}.

Let us now pick a threshold {t} uniformly at random from the interval {[\min_v g(v) , \max_v g(v)]}, and define the random variables

\displaystyle  S_t:= \{ v: g(v) \leq t

We observe that for every pairs of vertices {u,v} we have

\displaystyle  \mathop{\mathbb E} |1_{S_t(}u) - 1_{S_t}(v) | = |g(u) - g(v)|

and so we get

\displaystyle  \frac {\sum_{u,v} A_{u,v} || f(u) - f(v) ||_1 } {\sum_{u,v} || f(u) - f(v) ||_1 }

\displaystyle  \geq \frac {\sum_{u,v} A_{u,v} | g(u) - g(v) | } { \sum_{u,v} | g(u) - g(v) | }

\displaystyle  = \frac {\mathop{\mathbb E} \sum_{u,v} A_{u,v} |1_{S_t}(u) - 1_{S_t}(v) |} {\mathop{\mathbb E}\sum_{u,v} |1_{S_t} (u)- 1_{S_t}(v)| }

Finally, by an application of (5), we see that there must be a set {S} among the possible values of {S_t } such that (4) holds. Notice that the proof was completely constructive: we simply took the coordinate {f_{i^*}} of {f} with the lowest cost function, and then the “threshold cut” given by {f_{i^*}} with the smallest sparsity. \Box

3. A Theorem of Bourgain

We will derive our main result (2) from the L1 “rounding” process of the previous section, and from the following theorem of Bourgain (the efficiency considerations are due to Linial, London and Rabinovich).

Theorem 2 (Bourgain) Let {d: V\times V \rightarrow {\mathbb R}} be a semimetric defined over a finite set {V}. Then there exists a mapping {f: V \rightarrow {\mathbb R}^m} such that, for every two elements {u,v \in R},

\displaystyle  || f(u) - f(v)||_1 \leq d(u,v) \leq ||f(u)-f(v)||_1 \cdot c\cdot \log |V|

where {c} is an absolute constant. Given {d}, the mapping {f} can be found with high probability in randomized polynomial time in {|V|}.

To see that the above theorem of Bourgain implies (2), consider a graph {G}, and let {d} be the optimal solution of the Leighton-Rao relaxation of the sparsest cut problem on {G}, and let {f:V\rightarrow {\mathbb R}} be a mapping as in Bourgain’s theorem applied to {d}. Then

\displaystyle  LR(G) = \frac{|V|^2}{|E|} \cdot \frac{\sum_{u,v} A_{u,v} d(u,v) }{\sum_{u,v} d(u,v) }

\displaystyle  \geq \frac{|V|^2}{|E|} \cdot \frac{\sum_{u,v} A_{u,v} ||f(u)-f(v) ||_1 }{c\cdot \log |V| \cdot \sum_{u,v} ||f(u)-f(v)||_1 }

\displaystyle  \geq \frac 1 {c\cdot \log |U|} \cdot \phi(G)

CS261 Lecture 7: Rounding Linear Programs

In which we show how to use linear programming to approximate the vertex cover problem.

1. Linear Programming Relaxations

An integer linear program (abbreviated ILP) is a linear program (abbreviated LP) with the additional constraints that the variables must take integer values. For example, the following is an ILP:

\displaystyle  \begin{array}{ll} {\rm maximize} & x_1 - x_2 + 2x_3 \\ {\rm subject\ to} \\ & x_1- x_2 \leq 1\\ & x_2 + x_3 \leq 2\\ & x_1 \in {\mathbb N}\\ & x_2 \in {\mathbb N}\\ & x_3 \in {\mathbb N} \end{array} \ \ \ \ \ (1)

Where {{\mathbb N} = \{ 0,1,2,\ldots \}} is the set of natural numbers. The advantage of ILPs is that they are a very expressive language to formulate optimization problems, and they can capture in a natural and direct way a large number of combinatorial optimization problems. The disadvantage of ILPs is that they are a very expressive language to formulate combinatorial optimization problems, and finding optimal solutions for ILPs is NP-hard.

If we are interested in designing a polynomial time algorithm (exact or approximate) for a combinatorial optimization problem, formulating the combinatorial optimization problem as an ILP is useful as a first step in the following methodology (the discussion assumes that we are working with a minimization problem):

  • Formulate the combinatorial optimization problem as an ILP;
  • Derive a LP from the ILP by removing the constraint that the variables have to take integer value. The resulting LP is called a “relaxation” of the original problem. Note that in the LP we are minimizing the same objective function over a larger set of solutions, so {opt(LP) \leq opt(ILP)};
  • Solve the LP optimally using an efficient algorithm for linear programming;
    • If the optimal LP solution has integer values, then it is a solution for the ILP of cost {opt(LP) \leq opt (ILP)}, and so we have found an optimal solution for the ILP and hence an optimal solution for our combinatorial optimization problem;
    • If the optimal LP solution {x^*} has fractional values, but we have a rounding procedure that transforms {x^*} into an integral solution {x'} such that {cost(x') \leq c\cdot cost(x^*)} for some constant {c}, then we are able to find a solution to the ILP of cost {\leq c\cdot opt(LP) \leq c\cdot opt(ILP)}, and so we have a {c}-approximate algorithm for our combinatorial optimization problem.

In this lecture and in the next one we will see how to round fractional solutions of relaxations of the Vertex Cover and the Set Cover problem, and so we will be able to derive new approximation algorithms for Vertex Cover and Set Cover based on linear programming.

Continue reading

CS261 Lecture 6: Duality in Linear Programming

In which we introduce the theory of duality in linear programming.

1. The Dual of Linear Program

Suppose that we have the following linear program in maximization standard form:

\displaystyle   \begin{array}{ll} {\rm maximize} & x_1 + 2x_2 + x_3 + x_4\\ {\rm subject\ to}\\ & x_1 +2 x_2 + x_3 \leq 2\\ & x_2 + x_4\leq 1\\ & x_1 + 2x_3\leq 1\\ & x_1 \geq 0\\ & x_2 \geq 0\\ &x_3 \geq 0 \end{array} \ \ \ \ \ (1)

and that an LP-solver has found for us the solution {x_1:=1}, {x_2:=\frac 12}, {x_3:=0}, {x_4:= \frac 12} of cost {2.5}. How can we convince ourselves, or another user, that the solution is indeed optimal, without having to trace the steps of the computation of the algorithm?

Observe that if we have two valid inequalities

\displaystyle  a\leq b \ {\rm and } \ c \leq d

then we can deduce that the inequality

\displaystyle  a+c \leq b+d

(derived by “summing the left hand sides and the right hand sides” of our original inequalities) is also true. In fact, we can also scale the inequalities by a positive multiplicative factor before adding them up, so for every non-negative values {y_1,y_2 \geq 0} we also have

\displaystyle  y_1 a + y_2 c \leq y_1 b + y_2 d

Going back to our linear program (1), we see that if we scale the first inequality by {\frac 12}, add the second inequality, and then add the third inequality scaled by {\frac 12}, we get that, for every {(x_1,x_2,x_3,x_4)} that is feasible for (1),

\displaystyle  x_1 + 2x_2 + 1.5 x_3 + x_4 \leq 2.5

And so, for every feasible {(x_1,x_2,x_3,x_4)}, its cost is

\displaystyle  x_1 + 2x_2 + x_3 + x_4 \leq x_1 + 2x_2 + 1.5 x_3 + x_4 \leq 2.5

meaning that a solution of cost {2.5} is indeed optimal.

In general, how do we find a good choice of scaling factors for the inequalities, and what kind of upper bounds can we prove to the optimum?

Continue reading

CS261 Lecture 5: Linear Programming

In which we introduce linear programming.

1. Linear Programming

A linear program is an optimization problem in which we have a collection of variables, which can take real values, and we want to find an assignment of values to the variables that satisfies a given collection of linear inequalities and that maximizes or minimizes a given linear function.

(The term programming in linear programming, is not used as in computer programming, but as in, e.g., tv programming, to mean planning.)

For example, the following is a linear program.

\displaystyle   \begin{array}{ll} {\rm maximize} & x_1 + x_2\\ {\rm subject\ to}\\ & x_1 + 2x_2 \leq 1\\ & 2x_1 + x_2\leq 1\\ & x_1 \geq 0\\ & x_2 \geq 0 \end{array} \ \ \ \ \ (1)

The linear function that we want to optimize ({x_1+x_2} in the above example) is called the objective function. A feasible solution is an assignment of values to the variables that satisfies the inequalities. The value that the objective function gives to an assignment is called the cost of the assignment. For example, {x_1 := \frac 13} and {x_2 := \frac 13} is a feasible solution, of cost {\frac 23}. Note that if {x_1,x_2} are values that satisfy the inequalities, then, by summing the first two inequalities, we see that

\displaystyle  3x_1 + 3 x_2 \leq 2

that is,

\displaystyle  x_1 + x_2 \leq \frac 23

and so no feasible solution has cost higher than {\frac 23}, so the solution {x_1 := \frac 13}, {x_2 := \frac 13} is optimal. As we will see in the next lecture, this trick of summing inequalities to verify the optimality of a solution is part of the very general theory of duality of linear programming.

Linear programming is a rather different optimization problem from the ones we have studied so far. Optimization problems such as Vertex Cover, Set Cover, Steiner Tree and TSP are such that, for a given input, there is only a finite number of possible solutions, so it is always trivial to solve the problem in finite time. The number of solutions, however, is typically exponentially big in the size of the input and so, in order to be able to solve the problem on reasonably large inputs, we look for polynomial-time algorithms. In linear programming, however, each variable can take an infinite number of possible values, so it is not even clear that the problem is solvable in finite time.

As we will see, it is indeed possible to solve linear programming problems in finite time, and there are in fact, polynomial time algorithms and efficient algorithms that solve linear programs optimally.

There are at least two reasons why we are going to study linear programming in a course devoted to combinatorial optimization:

  • Efficient linear programming solvers are often used as part of the toolkit to design exact or approximate algorithms for combinatorial problems.
  • The powerful theory of duality of linear programming, that we will describe in the next lecture, is a very useful mathematical theory to reason about algorithms, including purely combinatorial algorithms for combinatorial problems that seemingly have no connection with continuous optimization.

Continue reading

CS261 Lecture 1: Overview

In which we describe what this course is about and give two simple examples of approximation algorithms

1. Overview

In this course we study algorithms for combinatorial optimization problems. Those are the type of algorithms that arise in countless applications, from billion-dollar operations to everyday computing task; they are used by airline companies to schedule and price their flights, by large companies to decide what and where to stock in their warehouses, by delivery companies to decide the routes of their delivery trucks, by Netflix to decide which movies to recommend you, by a gps navigator to come up with driving directions and by word-processors to decide where to introduce blank spaces to justify (align on both sides) a paragraph.

In this course we will focus on general and powerful algorithmic techniques, and we will apply them, for the most part, to highly idealized model problems.

Some of the problems that we will study, along with several problems arising in practice, are NP-hard, and so it is unlikely that we can design exact efficient algorithms for them. For such problems, we will study algorithms that are worst-case efficient, but that output solutions that can be sub-optimal. We will be able, however, to prove worst-case bounds to the ratio between the cost of optimal solutions and the cost of the solutions provided by our algorithms. Sub-optimal algorithms with provable guarantees about the quality of their output solutions are called approximation algorithms.

The content of the course will be as follows:

Continue reading