In which we introduce the maximum flow problem.
1. Flows in Networks
Today we start talking about the Maximum Flow problem. As a motivating example, suppose that we have a communication network, in which certain pairs of nodes are linked by connections; each connection has a limit to the rate at which data can be sent. Given two nodes on the network, what is the maximum rate at which one can send data to the other, assuming no other pair of nodes are attempting to communicate?
For example, consider the following network, and suppose that needs to send data at a rate of
to
. Is this possible?
The answer is yes: can split its stream of data, and send
to
and
to
. The node
relays the
stream of data that it receives from
to
, while node
splits the
stream that it receives into two parts: it relays
of data to
, and send the remaining
to
. Overall,
receives
of data, which it relays to
.
Can send more data, such as, say,
to
? The answer is no. Consider the two links
and
in the network, and suppose you removed them from the network. Then there would be no way reach
from
, and no communication would be possible. This means that all the data that
sends to
must pass through one of those links. Between the two of them, the two links
and
support at most the transmission of
of data, and so that is the maximum rate at which
can send data to
.
The networks that we will work with could model other settings in which “stuff” has to go from one place to another, subject to capacity constraints in links. For example the network could model a public transit system, or a highway system. In some cases, we will be interested in instances of the problem which are constructed to model other combinatorial problems.
In some such applications, it makes sense to consider networks in which the capacity of a link depends on the direction, so that the capacity of the link could be different from the capacity of the link
. Formally, an instance of the maximum flow problem is defined as follows.
Definition 1 A network is a directed graph
, in which
- a vertex
and a vertex
are specified as being the source node and the sink node, respectively.
- every directed edge
has a positive capacity
associated to it. If both the edges
and
belong to
, we allow their capacities to be different.
Sometimes we will write expressions that include capacities between pairs of vertices that are not connected by an edge. In that case the convention is that the capacity is zero.
A flow in a network is a specification of how to route “stuff” from to
so that no link is used beyond its capacity, and so that every link, except the sender
and the receiver
, relays out “stuff” at exactly the same rate at which it receives from other vertices. In the motivating example of a communication network, if nodes where sending out less data than they receive, there would be data loss, and they cannot send out more data than they receive because they are simply forwarding data. Formally, we have the following definition.
Definition 2 A flow in an network
is an assignment of a non-negative number
to every edge
such that
- For every edge
,
;
- For every vertex
,
![]()
where we follow the convention that
if
. (This convention is useful otherwise the second condition would have to be something like
![]()
The cost of the flow (the throughput of the communication, in the communication network example) is
In the maximum flow problem, given a network we want to find a flow of maximum cost.
For example, here is an example of a network:
And the following is a flow in the network (a label on an edge
means that the flow
is
and the capacity
is
).
Is the flow optimal? We are only sending 3 units of flow from to
, while we see that we can send 2 units along the
path, and another 2 units along the
path, for a total of 4, so the above solution is not optimal. (Is the solution we just discussed, of cost 4, optimal?)
In general, how do we reason about the optimality of a flow? This is an important question, because when we talked about approximation algorithms for minimization problems we noted that, in each case, we were able to reason about the approximation quality of an algorithm only because we had ways to prove lower bounds to the optimum. Here that we are dealing with a maximization problem, if we are going to reason about the quality of solutions provided by our algorithms, we need ways to prove upper bounds to the optimum.
When we considered the first example of the lecture, we noticed that if we look at any set of edges whose removal disconnects the receiver from the sender, then all the flow from the sender to the receiver must pass through those edges, and so their total capacity is an upper bound to the cost of any flow, including the optimal flow. This motivates the following definition.
Definition 3 (Cut) A cut in a network
, is partition
of the set of vertices
into two sets, such that
and
. We will usually identify a cut with the set
that contains
. The capacity of a cut is the quantity
The motivation for the definition is the following: let be a cut in a network, that is, a set of vertices that contains
and does not contain
. Consider the set of edges
. If we remove those edges from the network, then it is not possible to go from any vertex in
to any vertex outside
and, in particular, it is not possible to go from
to
. This means that all the flow from
to
must pass through those edges, and so the total capacity of those edges (that, is
) is an upper bound to the cost of any feasible flow.
Even though what we just said is rather self-evident, let us give a rigorous proof, because this will help familiarize ourselves with the techniques used to prove rigorous results about flows. (Later, we will need to prove statements that are far from obvious.)
Lemma 4 For every network
, any flow
in the network, and any cut
,
That is, the cost of the flow is at most the capacity of the cut.
We will derive the lemma from Fact 5 below.
If is a network,
is a flow, and
is a cut, define the net flow out of
to be
that is, the total flow from to
minus the total flow from
to
. Then we have:
Fact 5 For every network
, every flow
, and every cut
, the net flow out of
is the same as the cost of the flow:
Proof: Consider the expression
on the one hand, we have
because for every edge such that at least one of
is in
we see that
appears twice in the expression for
, once with a
sign and once with a
sign, so all terms cancel. On the other hand, we have
because of the definition of flow, and so
To prove Lemma 4, consider any network , any flow
and any cut
. We have
So we have proved Lemma 4, and we have a way to “certify” the optimality of a flow, if we are able to find a flow and a cut such that the cost of the flow is equal to the capacity of the cut.
Consider now the complementary question: how do we see if a given flow in a network can be improved? That is, what is a clear sign that a given flow is not optimal? If we see a path from to
such that all the edges are used at less than full capacity, then it is clear that we can push extra flow along that path and that the solution is not optimal. Can there be other cases in which a given solution is not optimal? Indeed there can. Going back to the last example that we considered, we had a flow of cost 3, which was not optimal (because a flow of cost 4 existed), but if we look at the three possible paths from
to
, that is,
,
, and
, they each involve an edge used at full capacity.
However, let us reason in the following way: suppose that, in addition to the edges shown in the last picture, there were also an edge of capacity 1 from to
. Then we would have had the path
in which every edge has one unit of residual capacity, and we could have pushed an extra unit of flow along that path. In the resulting solution,
sends one unit flow to
, and
sends one unit of flow to
, a situation that is perfectly equivalent to
and
not sending each other anything, so that the extra edge from
to
is not needed after all. In general, if we are sending
units of flow from
to
, then we are effectively increasing the capacity of the edge from
to
, because we can “simulate” the effect of sending flow from
to
by simply sending less flow from
to
. These observations motivate the following definition:
Definition 6 (Residual Network) Let
be a network, and
be a flow. Then the residual network of
with respect to
is a network in which the edge
has capacity
The idea is that, in the residual network, the capacity of an edge measures how much more flow can be pushed along that edge in addition to the flow , without violating the capacity constraints. The edge
starts with capacity
, and
units of that capacity are taken by the current flow; in addition, we have
additional units of “virtual capacity” that come from the fact that we can reduce the flow from
to
.
An augmenting path in a network is a path from to
in which every edge has positive capacity in the residual network. For example, in our last picture, the path
is an augmenting path.
The Ford-Fulkerson algorithm is a simple greedy algorithm that starts from an empty flow and, as long as it can find an augmenting path, improves the current solution using the path.
Algorithm Ford-Fulkerson
- Input: network
-
- compute the capacities
of the residual network
- while there is a path
from
to
such that all edges in the path have positive residual capacity
- let
be the smallest of the residual capacities of the edges of
- let
be a flow that pushes
units of flow along
, that is,
if
, and
otherwise
-
, that is,
- for every pair of vertices such that
and
are both positive, let
and let
and
- recompute the capacities
of the residual network according to the new flow
- let
- return
At every step, the algorithm increases the cost of the current solution by a positive amount and, the algorithm converges in finite time to a solution that cannot be improved via an augmenting path. Note the “clean-up” step after the flow is increased, which makes sure that flow pushed along a “virtual edge” in the residual network is realized by reducing the actual flow in the opposite direction. The following theorem shows that the Ford-Fulkerson algorithm is optimal, and it proves the important fact that whenever a cut is optimal, its optimality can always be proved by exhibiting a cut whose capacity is equal to the cost of the flow.
Theorem 7 (Max Flow-Min Cut) The following three conditions are equivalent for a flow
in a network:
- There is a cut whose capacity is equal to the cost of
![]()
- The flow
is optimal
- There is no augmenting path for the flow
![]()
Proof: We have already proved that (1) implies (2), and it is clear that (2) implies (3), so the point of the theorem is to prove that (3) implies (1).
Let be a flow such that there is no augmenting path in the residual network. Take
to be the set of vertices reachable from
(including
) via edges that have positive capacity in the residual network. Then:
-
by definition
-
otherwise we would have an augmenting path.
So is a cut. Now observe that for every two vertices
and
, the capacity of the edge
in the residual network must be zero, otherwise we would be able to reach
from
in the residual network via positive-capacity edges, but
means that no such path can exist.
Recall that the residual capacity of the edge is defined as
and that and that
, so that the only way for the residual capacity to be zero is to have
Now just observe that the cost of the flow is
and so the capacity of the cut is indeed equal to the cost of the flow.
Remark 1 Suppose that we had defined the residual network as a network in which the capacity of the edge
is
, without the extra “virtual capacity” coming from the flow from
to
, and suppose that we defined an augmenting path to be a path from
to
in which each capacity in the residual network (according to this definition) is positive. Then we have seen before an example of a flow that has no augmenting path according to this definition, but that is not optimal. Where does the proof of the Max-Flow Min-Cut theorem break down if we use the
definition of residual capacity?
Pingback: My Computer Science Research : Maximizing Single-Commodity Unsplittable Network Flow – This Is What I Know