In which we define a multi-commodity flow problem, and we see that its dual is the relaxation of a useful graph partitioning problem. The relaxation can be rounded to yield an approximate graph partitioning algorithm.
1. Generalizations of the Maximum Flow Problem
An advantage of writing the maximum flow problem as a linear program, as we did in the past lecture, is that we can consider variations of the maximum flow problem in which we add extra constraints on the flow and, as long as the extra constraints are linear, we are guaranteed that we still have a polynomial time solvable problem. (Because we can still write the problem as a linear program, and we can solve linear programming in polynomial time.)
Certain variants of maximum flow are also easily reducible to the standard maximum flow problem, and so they are solvable using the combinatorial algorithms that we have discussed.
Example 1 (Vertex Capacities) An interesting variant of the maximum flow problem is the one in which, in addition to having a capacity for every edge, we also have a capacity for every vertex, and a flow is feasible only if, in addition to the conservation constraints and the edge capacity constraints, it also satisfies the vertex capacity constraints
It is easy to see that the problem can be reduced to the standard maximum flow problem, by splitting every vertex into two vertices and , adding one edge of capacity , and then converting every edge ) to an edge and every edge to an edge . It is easy to show that solving the (standard) maximum flow problem on the new network is equivalent to solving the maximum flow with vertex capacity constraints in the original network.
Example 2 (Multiple Sources and Sinks and “Sum” Cost Function) Several important variants of the maximum flow problems involve multiple source-sink pairs , rather than just one source and one sink. Assuming that the “stuff” that the sources want to send to the sinks is of the same type, the problem is to find multiple feasible flows , , , where is a feasible flow from the source to the sink , and such that the capacity constraints
are satisfied. Such a flow is called a “multi-commodity” flow.
How do we measure how “good” is a multicommodity flow? A simple measure is to consider the sum of the costs
In such a case, we can do a reduction to the standard maximum flow problem by adding a “super-source” node , connected with edges of infinite capacity to the sources , and a “super-sink” node , to which all sinks are connected to, via infinite capacity edges. It is easy to see that the maximum flow from to is the same as the maximum sum of flows in a feasible multicommodity flow in the original network.
In many applications, looking at the sum of the costs of the various flows is not a “fair” measure. For example, if the underlying network is a communication network, and are pairs of nodes that need to communicate, a solution that provides of bandwidth between and and no bandwidth between and is not a very good solution compared, for example, with a solution that provides of bandwidth each between and and between and . (Especially so from the point of view of and .) There are various reasonable measures of the quality of a multicommodity flow which are more fair, for example we may be interested in maximizing the median flow, or the minimum flow. A rather general problem, which can be used to find multicommodity flows maximizing various cost measures is the following.
Definition 1 (Multicommodity Feasibility Problem) Given in input a network with capacities for each , and given a collection of (not necessarily disjoint) pairs , , , each having a demand , find a feasible multicommodity flow such that
or determine that no such multicommodity flow exists.
A more general version, which is defined as an optimization problem, is as follows.
Definition 2 (Maximizing Fractional Demands) Given in input a network with capacities for each , and given a collection of (not necessarily disjoint) pairs , , , each having a demand , find a feasible multicommodity flow such that
and such that is maximized.
Note that the vertices need not be distinct. For example, in the case of a communication network, we could have a broadcast problem in which a node wants to send data to all other nodes, in which case the source-sink pairs are all of the form for . It is useful to think of the pairs of vertices that require communication as defining a weighted graph, with the weights given by the demands. We will call the graph of demands. (In the broadcast example, would be a star graph.)
The Fractional Multicommodity Flow Problem can be easily formulated as a linear program.
As for the standard maximum flow problem, it is also possible to give a formulation that involves an exponential number of variables, but for which it is easier to derive the dual.
In the following formulation, is the set of all paths from to in , and we have a variable for each path in , for each , corresponding to how many units of flow from to are routed through the path .
Note that if contains only one edge , and , then we have the standard maximum flow problem.
2. The Dual of the Fractional Multicommodity Flow Problem
The dual of (2) has one variable for each , and a one variable for each . It is as follows:
Thinking a bit about (3) makes us realize that, in an optimal solution, without loss of generality is be the shortest path from to in the graph weighted by the . Indeed, the constraints force to be at most the length of the shortest -weighted path from to , and, if some is strictly smaller than the length of the shortest path, we can make it equal to the length of the shortest path without sacrificing feasibility and without changing the cost of the solution. The other observation is that, in an optimal solution, we have , because, in a solution in which , we can divide all the and all the by , and obtain a solution that is still feasible and has smaller cost. This means that the following linear program is equivalent to (3). We have a variable for every pair of vertices in :
3. The Sparsest Cut Problem
From now on, we restrict ourselves to the case in which the graphs and are undirected. In such a case, we have a variable for each unordered pair . The constraints can be equivalently restated as the triangle inequalities
This means that we are requiring to be non-negative, symmetric and to satisfy the triangle inequality, and so it is a metric over . (Technically, it is a semimetric because we can have distinct vertices at distance zero, and is not defined for all pairs, but only for pairs in , although we could extend it to all pairs by computing all-pairs shortest paths based on the weights for .)
These observations give us one more alternative formulation:
Now, finally, we can see that the above formulation is the linear programming relaxation of a cut problem.
If is a subset of vertices, we say that a pair is cut by if and , or vice versa.
Given an instance of the multicommodity flow problem, we say that a subset of vertices is a cut if it cuts at least one of the pairs in . The sparsest cut (also called quotient cut) problem is to find a cut that minimizes the ratio
which is called the sparsity of the cut .
Note that, if contains only one pair , and , then we have exactly the standard minimum cut problem.
Suppose that, in our multicommodity problem, there is a fractional flow of cost . Then, for each pair that is cut by , the units of flow from to must pass through edges of that are also cut by . Overall, units of flow must pass through those edges, whose overall capacity is at most , so we must have
which means that the sparsity of must be at least . This means that sparsity of every cut is at least the fractional cost of any flow. (This is not surprising because we derived the sparsest cut problem from the dual of the flow problem, but there is a very simple direct reason why the above bound holds.)
Now it would be very nice if we had an exact rounding algorithm to find the optimum of the sparsest cut problem.
For a given graph with capacities , if we define to be a clique and for all , then solving the sparsest cut problem on and becomes the problem of finding a set that minimizes
and optimizing such a cost function tends to favor finding sets that are large and that have few edges coming out. This is useful in a number of contexts. In clustering problems, if the capacities represent similarity, a sparsest cut algorithm will pick out sets of vertices that are mostly similar to each other, but dissimilar to the other vertices, that is, a cluster. Very effective image segmentation algorithms are based on applying sparsest cut approximation algorithms (but not the one we are describing in these notes, which is too slow) to graphs in which there is a vertex for every pixel, and edges connect nearby pixels with a capacity corresponding to how likely the pixels are to belong to same object in the image.
Unfortunately, the sparsest cut problem is NP-hard. Rounding (4), however, it is possible to achieve a -factor approximation.
We very briefly describe what the approximation algorithm looks like.
First, we need the following result:
Lemma 3 For every input , and every feasible solution of (4), it is possible to find in polynomial time a subset of vertices, such that if we define
then we have
The proof is rather complicated, and we will skip it.
Then we have the following fact:
Lemma 4 For every input , every feasible solution of (4), and every subset of vertices, if we define , we have
Proof: It is enough to show that we have, for every ,
Let be the vertex such that and be the vertex such that . (They need not be different.) Then, from the triangle inequality, we get
Proof: We sort the vertices in ascending value of , so that we have an ordering of the vertices such that
We are going to consider all the cuts of the form , and we will show that at least one of them has sparsity at most
Since does not change if we scale by a multiplicative constant, we will assume without loss of generality that .
Let us pick a threshold uniformly at random in the interval , and define the set . Now note that, for every pair of vertices , the probability that is cut by is precisely , and so
and so there must exist an in our sample space (which consists of sets of the form ) such that
This is enough to have an -approximate algorithm for the sparsest cut problem.
On input the graphs , the capacities and the demands , we solve the linear program (4), and find an optimal solution of cost . Then we use Lemma 3 to find a set such that, if we define , we have (using also Lemma 4)
Finally, we use the algorithm of Lemma 5 to find a cut whose sparsity is at most , which is at most times the sparsity of the optimal cut. This proves the main result of this lecture.
Theorem 6 There is a polynomial time -approximate algorithm for the sparsest cut problem.