In which we introduce the maximum flow problem.
1. Flows in Networks
Today we start talking about the Maximum Flow problem. As a motivating example, suppose that we have a communication network, in which certain pairs of nodes are linked by connections; each connection has a limit to the rate at which data can be sent. Given two nodes on the network, what is the maximum rate at which one can send data to the other, assuming no other pair of nodes are attempting to communicate?
For example, consider the following network, and suppose that needs to send data at a rate of to . Is this possible?
The answer is yes: can split its stream of data, and send to and to . The node relays the stream of data that it receives from to , while node splits the stream that it receives into two parts: it relays of data to , and send the remaining to . Overall, receives of data, which it relays to .
Can send more data, such as, say, to ? The answer is no. Consider the two links and in the network, and suppose you removed them from the network. Then there would be no way reach from , and no communication would be possible. This means that all the data that sends to must pass through one of those links. Between the two of them, the two links and support at most the transmission of of data, and so that is the maximum rate at which can send data to .
The networks that we will work with could model other settings in which “stuff” has to go from one place to another, subject to capacity constraints in links. For example the network could model a public transit system, or a highway system. In some cases, we will be interested in instances of the problem which are constructed to model other combinatorial problems.
In some such applications, it makes sense to consider networks in which the capacity of a link depends on the direction, so that the capacity of the link could be different from the capacity of the link . Formally, an instance of the maximum flow problem is defined as follows.
Definition 1 A network is a directed graph , in which
- a vertex and a vertex are specified as being the source node and the sink node, respectively.
- every directed edge has a positive capacity associated to it. If both the edges and belong to , we allow their capacities to be different.
Sometimes we will write expressions that include capacities between pairs of vertices that are not connected by an edge. In that case the convention is that the capacity is zero.
A flow in a network is a specification of how to route “stuff” from to so that no link is used beyond its capacity, and so that every link, except the sender and the receiver , relays out “stuff” at exactly the same rate at which it receives from other vertices. In the motivating example of a communication network, if nodes where sending out less data than they receive, there would be data loss, and they cannot send out more data than they receive because they are simply forwarding data. Formally, we have the following definition.
Definition 2 A flow in an network is an assignment of a non-negative number to every edge such that
- For every edge , ;
- For every vertex ,
where we follow the convention that if . (This convention is useful otherwise the second condition would have to be something like
The cost of the flow (the throughput of the communication, in the communication network example) is
In the maximum flow problem, given a network we want to find a flow of maximum cost.
For example, here is an example of a network:
And the following is a flow in the network (a label on an edge means that the flow is and the capacity is ).
Is the flow optimal? We are only sending 3 units of flow from to , while we see that we can send 2 units along the path, and another 2 units along the path, for a total of 4, so the above solution is not optimal. (Is the solution we just discussed, of cost 4, optimal?)
In general, how do we reason about the optimality of a flow? This is an important question, because when we talked about approximation algorithms for minimization problems we noted that, in each case, we were able to reason about the approximation quality of an algorithm only because we had ways to prove lower bounds to the optimum. Here that we are dealing with a maximization problem, if we are going to reason about the quality of solutions provided by our algorithms, we need ways to prove upper bounds to the optimum.
When we considered the first example of the lecture, we noticed that if we look at any set of edges whose removal disconnects the receiver from the sender, then all the flow from the sender to the receiver must pass through those edges, and so their total capacity is an upper bound to the cost of any flow, including the optimal flow. This motivates the following definition.
Definition 3 (Cut) A cut in a network , is partition of the set of vertices into two sets, such that and . We will usually identify a cut with the set that contains . The capacity of a cut is the quantity
The motivation for the definition is the following: let be a cut in a network, that is, a set of vertices that contains and does not contain . Consider the set of edges . If we remove those edges from the network, then it is not possible to go from any vertex in to any vertex outside and, in particular, it is not possible to go from to . This means that all the flow from to must pass through those edges, and so the total capacity of those edges (that, is ) is an upper bound to the cost of any feasible flow.
Even though what we just said is rather self-evident, let us give a rigorous proof, because this will help familiarize ourselves with the techniques used to prove rigorous results about flows. (Later, we will need to prove statements that are far from obvious.)
That is, the cost of the flow is at most the capacity of the cut.
We will derive the lemma from Fact 5 below.
If is a network, is a flow, and is a cut, define the net flow out of to be
that is, the total flow from to minus the total flow from to . Then we have:
Proof: Consider the expression
on the one hand, we have
because for every edge such that at least one of is in we see that appears twice in the expression for , once with a sign and once with a sign, so all terms cancel. On the other hand, we have
because of the definition of flow, and so
To prove Lemma 4, consider any network , any flow and any cut . We have
So we have proved Lemma 4, and we have a way to “certify” the optimality of a flow, if we are able to find a flow and a cut such that the cost of the flow is equal to the capacity of the cut.
Consider now the complementary question: how do we see if a given flow in a network can be improved? That is, what is a clear sign that a given flow is not optimal? If we see a path from to such that all the edges are used at less than full capacity, then it is clear that we can push extra flow along that path and that the solution is not optimal. Can there be other cases in which a given solution is not optimal? Indeed there can. Going back to the last example that we considered, we had a flow of cost 3, which was not optimal (because a flow of cost 4 existed), but if we look at the three possible paths from to , that is, , , and , they each involve an edge used at full capacity.
However, let us reason in the following way: suppose that, in addition to the edges shown in the last picture, there were also an edge of capacity 1 from to . Then we would have had the path in which every edge has one unit of residual capacity, and we could have pushed an extra unit of flow along that path. In the resulting solution, sends one unit flow to , and sends one unit of flow to , a situation that is perfectly equivalent to and not sending each other anything, so that the extra edge from to is not needed after all. In general, if we are sending units of flow from to , then we are effectively increasing the capacity of the edge from to , because we can “simulate” the effect of sending flow from to by simply sending less flow from to . These observations motivate the following definition:
Definition 6 (Residual Network) Let be a network, and be a flow. Then the residual network of with respect to is a network in which the edge has capacity
The idea is that, in the residual network, the capacity of an edge measures how much more flow can be pushed along that edge in addition to the flow , without violating the capacity constraints. The edge starts with capacity , and units of that capacity are taken by the current flow; in addition, we have additional units of “virtual capacity” that come from the fact that we can reduce the flow from to .
An augmenting path in a network is a path from to in which every edge has positive capacity in the residual network. For example, in our last picture, the path is an augmenting path.
The Ford-Fulkerson algorithm is a simple greedy algorithm that starts from an empty flow and, as long as it can find an augmenting path, improves the current solution using the path.
- Input: network
- compute the capacities of the residual network
- while there is a path from to such that all edges in the path have positive residual capacity
- let be the smallest of the residual capacities of the edges of
- let be a flow that pushes units of flow along , that is, if , and otherwise
- , that is,
- for every pair of vertices such that and are both positive, let and let and
- recompute the capacities of the residual network according to the new flow
At every step, the algorithm increases the cost of the current solution by a positive amount and, the algorithm converges in finite time to a solution that cannot be improved via an augmenting path. Note the “clean-up” step after the flow is increased, which makes sure that flow pushed along a “virtual edge” in the residual network is realized by reducing the actual flow in the opposite direction. The following theorem shows that the Ford-Fulkerson algorithm is optimal, and it proves the important fact that whenever a cut is optimal, its optimality can always be proved by exhibiting a cut whose capacity is equal to the cost of the flow.
Theorem 7 (Max Flow-Min Cut) The following three conditions are equivalent for a flow in a network:
- There is a cut whose capacity is equal to the cost of
- The flow is optimal
- There is no augmenting path for the flow
Proof: We have already proved that (1) implies (2), and it is clear that (2) implies (3), so the point of the theorem is to prove that (3) implies (1).
Let be a flow such that there is no augmenting path in the residual network. Take to be the set of vertices reachable from (including ) via edges that have positive capacity in the residual network. Then:
- by definition
- otherwise we would have an augmenting path.
So is a cut. Now observe that for every two vertices and , the capacity of the edge in the residual network must be zero, otherwise we would be able to reach from in the residual network via positive-capacity edges, but means that no such path can exist.
Recall that the residual capacity of the edge is defined as
and that and that , so that the only way for the residual capacity to be zero is to have
Now just observe that the cost of the flow is
and so the capacity of the cut is indeed equal to the cost of the flow.
Remark 1 Suppose that we had defined the residual network as a network in which the capacity of the edge is , without the extra “virtual capacity” coming from the flow from to , and suppose that we defined an augmenting path to be a path from to in which each capacity in the residual network (according to this definition) is positive. Then we have seen before an example of a flow that has no augmenting path according to this definition, but that is not optimal. Where does the proof of the Max-Flow Min-Cut theorem break down if we use the definition of residual capacity?