*In which we show how to solve the maximum matching problem and the minimum vertex cover problem in bipartite graphs.*

In this lecture we show applications of the theory of (and of algorithms for) the maximum flow problem to the design of algorithms for problems in bipartite graphs.

A bipartite graph is an undirected graph such that the set of vertices can be partitioned into two subsets and such that every edge in has one endpoint in and one endpoint in .

For example, the 3-cube is bipartite, as can be seen by putting in all the vertices whose label has an even number of ones and in all the vertices whose label has an odd number of ones.

There is a simple linear time algorithm that checks if a graph is bipartite and, if so, finds a partition of into sets and such that all edges go between and : run DFS and find a spanning forest, that is, a spanning tree of the graph in each connected component. Construct sets and in the following way. In each tree, put the root in , and then put in all the vertices that, in the tree, have odd distance from the root; put in all the vertices that, in the tree, have even distance from the root. If the resulting partition is not valid, that is, if there is some edge both whose endpoints are in or both whose endpoints are in , then there is some tree in which two vertices , are connected by an edge, even though they are both at even distance or both at odd distance from the root ; in such a case, the cycle that goes from to along the tree, then follows the edge and then goes from to along the three is an odd-length cycle, and it is easy to prove that in a bipartite graph there is no odd cycle. Hence the algorithm either returns a valid bipartition or a certificate that the graph is not bipartite.

Several optimization problems become simpler in bipartite graphs. The problem of finding a *maximum matching* in a graph is solvable in polynomial time in general graphs, but it has a very simple algorithm in bipartite graphs, that we shall see shortly. (The algorithm for general graphs is beautiful but rather complicated.) The algorithm is based on a reduction to the maximum flow problem. The reduction has other applications, because it makes the machinery of the max flow – min cut theorem applicable to reason about matchings. We are going to see a very simple proof of Hall’s theorem, a classical result in graph theorem, which uses the max flow – min cut theorem.

As another application, we are going to show how to solve optimally the minimum vertex cover problem in bipartite graphs using a minimum cut computation, and the relation between flows and matchings. In general graphs, the minimum vertex cover problem is NP-complete.

The problem of finding a *maximum matching* in a graph, that is, a matching with the largest number of edges, often arises in assignment problems, in which tasks are assigned to agents, and almost always the underlying graph is bipartite, so it is of interest to have simpler and/or faster algorithms for maximum matchings for the special case in which the input graph is bipartite.

We will describe a way to r*reduce* the maximum matching problem in bipartite graphs to the maximum flow problem, that is, a way to show that a given bipartite graph can be transformed into a network such that, after finding a maximum flow in the network, we can easily reconstruct a maximum matching in the original graph.

**1. Maximum Matching in Bipartite Graphs **

Recall that, in an undirected graph , a *matching* is a subset of edges that have no endpoint in common. In a bipartite graph with bipartition , the edges of the matching, like all other edges, have one endpoint in and one endpoint in .

Consider the following algorithm.

- Input: undirected bipartite graph , partition of into sets
- Construct a network as follows:
- the vertex set is , where and are two new vertices;
- contains a directed edge for every , a directed edge for every edge , where and , and a directed edge for every ;
- each edge has capacity 1;

- find a maximum flow in the network, making sure that all flows are either zero or one
- return

The running time of the algorithm is the time needed to solve the maximum flow on the network plus an extra amount of work to construct the network and to extract the solution from the flow. In the constructed network, the maximum flow is at most , and so, using the Ford-Fulkerson algorithm, we have running time . The fastest algorithm for maximum matching in bipartite graphs, which applies the push-relabel algorithm to the network, has running time . It is also possible to solve the problem in time , where is the time that it takes to multiply two matrices. (This approach does not use flows.) Using the currently best known matrix multiplication algorithm, the running time is about , which is better than in dense graphs. The algorithm based on push-relabel is always better in practice.

Remark 1 (Integral Flows)It is important in the reduction that we find a flow in which all flows are either zero or one. In a network in which all capacities are zero or one, all the algorithms that we have seen in class will find an optimal solution in which all flows are either zero or one. More generally, on input a network with integer capacities, all the algorithms that we have seen in class will find a maximum flow in which all are integers. It is important to keep in mind, however, that, even though in a network with zero/one capacities there always exists an optimal integral flow, there can also be optimal flows that are not integral.

We want to show that the algorithm is correct that is that: (1) the algorithm outputs a matching and (2) that there cannot be any larger matching than the one found by the algorithm.

Claim 1The algorithm always outputs a matching, whose size is equal to the cost of the maximal flow of .

*Proof:* Consider the flow found by the algorithm. For every vertex , the conservation constraint for and the capacity constraint on the edge imply:

and so at most one of the edges of can be incident on .

Similarly, for every we have

and so at most one of the edges in can be incident on .

Remark 2Note that the previous proof does not work if the flow is not integral

Claim 2The size of the largest matching in is at most the cost of the maximum flow in .

*Proof:* Let be a largest matching in . We can define a feasible flow in in the following way: for every edge , set . Set all the other flows to zero. We have defined a feasible flow, because every flow is either zero or one, and it is one only on edges of , so the capacity constraints are satisfied, and the conservation constraints are also satisfied, because for every vertex that is not matched in there is zero incoming flow and zero outgoing flow, while for the matched vertices there is one unit of incoming flow and one unit of outgoing flow. The cost of the flow is the number of vertices in that are matched, which is equal to .

This means that there exists a feasible flow whose cost is equal to , and so the cost of a maximum flow is greater than or equal to .

So we have established that our algorithm is correct and optimal.

**2. Perfect Matchings in Bipartite Graphs **

A *perfect* matching is a matching with edges. In a bipartite graph, a perfect matching can exist only if , and we can think of it as defining a bijective mapping between and .

For a subset , let us call the *neighborhood* of , that is, the set of vertices that are connected to vertices in by an edge in . Clearly, if there is a perfect matching in a bipartite graph with bipartition , then we must have , because the edges of the perfect matching match each vertex in to a distinct vertex in , and this is impossible if .

A classical result in graph theory, Hall’s Theorem, is that this is the only case in which a perfect matching does not exist.

Theorem 1 (Hall)A bipartite graph with bipartition such that has a perfect matching if and only if for every we have .

The theorem precedes the theory of flows and cuts in network, and the original proof was constructive and a bit complicated. We can get a very simple non-constructive proof from the max flow – min cut theorem.

*Proof:* We have already seen one direction of the theorem. It remains to prove that if for every , then has a perfect matching.

Equivalently, we will prove that if does not have a perfect matching, then there must be a set such that .

Let us construct the network as in the algorithm above, an let us call . If does not have a perfect matching, then it means that the size of the maximum matching in is , and so the size of the maximum flow in is , and so must have a cut of capacity . Let us call the cut .

Let us call the left vertices in , and the remaining left vertices, and similarly and .

In the network , all edges have capacity one, so the capacity of the cut is the number of edges that go from to the complement of , that is

where is the number of edges from to the complement of , is the number of edges from into , and is the number of edges in with one endpoint in and one endpoint in .

This means that we have

and, recalling that ,

We can also see that

because the neighborhood of can at most include vertices in . Overall, we have

and so we have found a set on the left that is bigger than its neighborhood.

**3. Vertex Cover in Bipartite Graphs **

The work that we have done on matching in bipartite graphs also gives us a very simple polynomial time algorithm for vertex cover.

- Input: undirected bipartite graph , partition of into sets
- Construct a network as before
- Find a minimum-capacity cut in the network
- Define , , ,
- Let be the set of vertices in that have neighbors in
- output

We want to show that the algorithm outputs a vertex cover, and that the size of the output set is indeed the size of the minimum vertex cover.

Claim 3The output of the algorithm is a vertex cover

*Proof:* The set covers all edges that have an endpoint either in or , because includes all of and all or . Regarding the remaining edges, that is, those that have endpoint in and the other endpoint in , all such edges are covered by .

Claim 4There is no vertex cover of size smaller than .

*Proof:* Let be the capacity of the cut. Then is equal to

and so

but is equal to the capacity of the minimum cut in , which is equal to the cost of the maximum flow in which, by what we proved in the previous section, is equal to the size of the maximum matching in . This means that has a matching of size , and so every vertex cover must have size .

## 6 comments

Comments feed for this article

August 12, 2011 at 6:23 pm

adamgreat blog. very clear explanations. thanks!

September 20, 2011 at 1:05 pm

JinI am somewhat confused about the set B in section 3. Other discussions of this topic I’ve read state that C := L2 U R1, implying that B is always empty. I think I understand why B must be empty:

If a vertex v in R2 has a neighbor u in L1, then the edge e connecting u,v must be in the maximum matching set M, or not in M. If it is not in M, then there is excess capacity on e (flow is 0) and since u is in the cut set S, v would also be in S, so v cannot be in R2. If e is in M, then there is flow from s to u, so the only way u can be in L1 (member of S) is for it to be reachable in the residual graph from an edge in M to a vertex in R1. But there can be only one such edge for any vertex, so that edge must be e, so v must be in R1, and cannot be in R2. In other words, all cut edges cannot be in E.

Is this correct?

September 20, 2011 at 1:50 pm

JinActually, now that I think on it further, this obviously only applies to minimum s-t cuts found using Ford-Fulkerson or the like. It is trivial to think up examples with minimum cuts that take place in E.

October 6, 2011 at 2:38 pm

lindzeynThis statement is incorrect:

“The fastest algorithm for maximum matching in bipartite graphs, which applies the push-relabel algorithm to the network, has running time .”

The Hopcroft-Karp method is ; however, for dense graphs, the matrix multiplication based algorithm that you presented is better.

November 30, 2012 at 4:02 pm

Ai need algorithms to find the minimum cost for bipartite graph

December 13, 2012 at 3:35 pm

jeanHi guy,

i want to know, how we can use the advice for the online bipartite matching problem. that mean how much information of the future we need to improve the competitive ratio of the greedy.

Thankx