In which we define a multi-commodity flow problem, and we see that its dual is the relaxation of a useful graph partitioning problem. The relaxation can be rounded to yield an approximate graph partitioning algorithm.
1. Generalizations of the Maximum Flow Problem
An advantage of writing the maximum flow problem as a linear program, as we did in the past lecture, is that we can consider variations of the maximum flow problem in which we add extra constraints on the flow and, as long as the extra constraints are linear, we are guaranteed that we still have a polynomial time solvable problem. (Because we can still write the problem as a linear program, and we can solve linear programming in polynomial time.)
Certain variants of maximum flow are also easily reducible to the standard maximum flow problem, and so they are solvable using the combinatorial algorithms that we have discussed.
Example 1 (Vertex Capacities) An interesting variant of the maximum flow problem is the one in which, in addition to having a capacity
for every edge, we also have a capacity
for every vertex, and a flow
is feasible only if, in addition to the conservation constraints and the edge capacity constraints, it also satisfies the vertex capacity constraints
It is easy to see that the problem can be reduced to the standard maximum flow problem, by splitting every vertex
into two vertices
and
, adding one edge
of capacity
, and then converting every edge
) to an edge
and every edge
to an edge
. It is easy to show that solving the (standard) maximum flow problem on the new network is equivalent to solving the maximum flow with vertex capacity constraints in the original network.
Example 2 (Multiple Sources and Sinks and “Sum” Cost Function) Several important variants of the maximum flow problems involve multiple source-sink pairs
, rather than just one source and one sink. Assuming that the “stuff” that the sources want to send to the sinks is of the same type, the problem is to find multiple feasible flows
,
,
, where
is a feasible flow from the source
to the sink
, and such that the capacity constraints
are satisfied. Such a flow is called a “multi-commodity” flow.
How do we measure how “good” is a multicommodity flow? A simple measure is to consider the sum of the costs
In such a case, we can do a reduction to the standard maximum flow problem by adding a “super-source” node
, connected with edges of infinite capacity to the sources
, and a “super-sink” node
, to which all sinks
are connected to, via infinite capacity edges. It is easy to see that the maximum flow from
to
is the same as the maximum sum of flows in a feasible multicommodity flow in the original network.
In many applications, looking at the sum of the costs of the various flows is not a “fair” measure. For example, if the underlying network is a communication network, and
are pairs of nodes that need to communicate, a solution that provides
of bandwidth between
and
and no bandwidth between
and
is not a very good solution compared, for example, with a solution that provides
of bandwidth each between
and
and between
and
. (Especially so from the point of view of
and
.) There are various reasonable measures of the quality of a multicommodity flow which are more fair, for example we may be interested in maximizing the median flow, or the minimum flow. A rather general problem, which can be used to find multicommodity flows maximizing various cost measures is the following.
Definition 1 (Multicommodity Feasibility Problem) Given in input a network
with capacities
for each
, and given a collection of (not necessarily disjoint) pairs
,
,
, each having a demand
, find a feasible multicommodity flow
such that
or determine that no such multicommodity flow exists.
A more general version, which is defined as an optimization problem, is as follows.
Definition 2 (Maximizing Fractional Demands) Given in input a network
with capacities
for each
, and given a collection of (not necessarily disjoint) pairs
,
,
, each having a demand
, find a feasible multicommodity flow
such that
and such that
is maximized.
Note that the vertices need not be distinct. For example, in the case of a communication network, we could have a broadcast problem in which a node
wants to send data to all other nodes, in which case the source-sink pairs are all of the form
for
. It is useful to think of the pairs of vertices that require communication as defining a weighted graph, with the weights given by the demands. We will call
the graph of demands. (In the broadcast example,
would be a star graph.)
The Fractional Multicommodity Flow Problem can be easily formulated as a linear program.
As for the standard maximum flow problem, it is also possible to give a formulation that involves an exponential number of variables, but for which it is easier to derive the dual.
In the following formulation, is the set of all paths from
to
in
, and we have a variable
for each path in
, for each
, corresponding to how many units of flow from
to
are routed through the path
.
Note that if contains only one edge
, and
, then we have the standard maximum flow problem.
2. The Dual of the Fractional Multicommodity Flow Problem
The dual of (2) has one variable for each
, and a one variable
for each
. It is as follows:
Thinking a bit about (3) makes us realize that, in an optimal solution, without loss of generality is be the shortest path from
to
in the graph weighted by the
. Indeed, the constraints force
to be at most the length of the shortest
-weighted path from
to
, and, if some
is strictly smaller than the length of the shortest path, we can make it equal to the length of the shortest path without sacrificing feasibility and without changing the cost of the solution. The other observation is that, in an optimal solution, we have
, because, in a solution in which
, we can divide all the
and all the
by
, and obtain a solution that is still feasible and has smaller cost. This means that the following linear program is equivalent to (3). We have a variable
for every pair of vertices in
:
3. The Sparsest Cut Problem
From now on, we restrict ourselves to the case in which the graphs and
are undirected. In such a case, we have a variable
for each unordered pair
. The constraints
can be equivalently restated as the triangle inequalities
This means that we are requiring to be non-negative, symmetric and to satisfy the triangle inequality, and so it is a metric over
. (Technically, it is a semimetric because we can have distinct vertices at distance zero, and
is not defined for all pairs, but only for pairs in
, although we could extend it to all pairs by computing all-pairs shortest paths based on the weights
for
.)
These observations give us one more alternative formulation:
Now, finally, we can see that the above formulation is the linear programming relaxation of a cut problem.
If is a subset of vertices, we say that a pair
is cut by
if
and
, or vice versa.
Given an instance of the multicommodity flow problem, we say that a subset of vertices is a cut if it cuts at least one of the pairs in
. The sparsest cut (also called quotient cut) problem is to find a cut
that minimizes the ratio
which is called the sparsity of the cut .
Note that, if contains only one pair
, and
, then we have exactly the standard minimum cut problem.
Suppose that, in our multicommodity problem, there is a fractional flow of cost . Then, for each pair
that is cut by
, the
units of flow from
to
must pass through edges of
that are also cut by
. Overall,
units of flow must pass through those edges, whose overall capacity is at most
, so we must have
which means that the sparsity of must be at least
. This means that sparsity of every cut is at least the fractional cost of any flow. (This is not surprising because we derived the sparsest cut problem from the dual of the flow problem, but there is a very simple direct reason why the above bound holds.)
Now it would be very nice if we had an exact rounding algorithm to find the optimum of the sparsest cut problem.
For a given graph with capacities
, if we define
to be a clique and
for all
, then solving the sparsest cut problem on
and
becomes the problem of finding a set
that minimizes
and optimizing such a cost function tends to favor finding sets that are large and that have few edges coming out. This is useful in a number of contexts. In clustering problems, if the capacities represent similarity, a sparsest cut algorithm will pick out sets of vertices that are mostly similar to each other, but dissimilar to the other vertices, that is, a cluster. Very effective image segmentation algorithms are based on applying sparsest cut approximation algorithms (but not the one we are describing in these notes, which is too slow) to graphs in which there is a vertex for every pixel, and edges connect nearby pixels with a capacity corresponding to how likely the pixels are to belong to same object in the image.
Unfortunately, the sparsest cut problem is NP-hard. Rounding (4), however, it is possible to achieve a -factor approximation.
We very briefly describe what the approximation algorithm looks like.
First, we need the following result:
Lemma 3 For every input
, and every feasible solution
of (4), it is possible to find in polynomial time a subset
of vertices, such that if we define
then we have
The proof is rather complicated, and we will skip it.
Then we have the following fact:
Lemma 4 For every input
, every feasible solution
of (4), and every subset
of vertices, if we define
, we have
Proof: It is enough to show that we have, for every ,
Let be the vertex such that
and
be the vertex such that
. (They need not be different.) Then, from the triangle inequality, we get
and
and so
Lemma 5 For every input
, and every function
, we can find in polynomial time a cut
such that
Proof: We sort the vertices in ascending value of , so that we have an ordering
of the vertices such that
We are going to consider all the cuts of the form , and we will show that at least one of them has sparsity at most
Since does not change if we scale
by a multiplicative constant, we will assume without loss of generality that
.
Let us pick a threshold uniformly at random in the interval
, and define the set
. Now note that, for every pair of vertices
, the probability that
is cut by
is precisely
, and so
and
so that
and so there must exist an in our sample space (which consists of sets of the form
) such that
that is,
This is enough to have an -approximate algorithm for the sparsest cut problem.
On input the graphs , the capacities
and the demands
, we solve the linear program (4), and find an optimal solution
of cost
. Then we use Lemma 3 to find a set
such that, if we define
, we have (using also Lemma 4)
Finally, we use the algorithm of Lemma 5 to find a cut whose sparsity is at most
, which is at most
times the sparsity of the optimal cut. This proves the main result of this lecture.
Theorem 6 There is a polynomial time
-approximate algorithm for the sparsest cut problem.
Reblogged this on boyguide and commented:
I have been wondering, hovering through out the net with one question in mind; what is Multicommodity Flow Optimization? This blog is almost close to the answer.