In which we describe a randomized algorithm for finding the minimum cut in an undirected graph.
1. Global Min-Cut and Edge-Connectivity
Definition 1 (Edge connectivity) We say that an undirected graph is -edge-connected if one needs to remove at least edges in order to disconnect the graph. Equivalently, an undirected graph is -edge-connected if the removal of any subset of edges leaves the graph connected.
Note that the definition is given in such a way that if a graph is, for example -edge-connected, then it is also -edge-connected and -edge connected. Being 1-edge-connected is the same as being connected.
For example, the graph below is connected and -edge connected, but it is not -edge connected, because removing the two edges and disconnects the graph.
As another example, consider the 3-cube:
The 3-cube is clearly not 4-edge-connected, because we can disconnect any vertex by removing the 3 edges incident on it. It is clearly connected, and it is easy to see that it is 2-edge-connected; for example we can see that it has a Hamiltonian cycle (a simple cycle that goes through all vertices), and so the removal of any edge still leaves a path that goes trough every vertex. Indeed the 3-cube is 3-connected, but at this point it is not clear how to argue it without going through some complicated case analysis.
The edge-connectivity of a graph is the largest for which the graph is -edge-connected, that is, the minimum such that it is possible to disconnect the graph by removing edges.
In graphs that represent communication or transportation networks, the edge-connectivity is an important measure of reliability.
Definition 2 (Global Min-Cut) The global min-cut problem is the following: given in input an undirected graph , we want to find the subset such that , , and the number of edges with one endpoint in and one endpoint in is minimized.
We will refer to a subset such that and as a cut in the graph, and we will call the number of edges with one endpoint in and one endpoint in the cost of the cut. We refer to the edges with one endpoint in and one endpoint in as the edges that cross the cut.
We can see that the Global Min Cut problem and the edge-connectivity problems are in fact the same problem:
- if there is a cut of cost , then the graph becomes disconnected (in particular, no vertex in is connected to any vertex in ) if we remove the edges that cross the cut, and so the edge-connectivity is at most . This means that the edge-connectivity of a graph is at most the cost of its minimum cut;
- if there is a set of edges whose removal disconnects the graph, then let be the set of vertices in one of the resulting connected components. Then is a cut, and its cost is at most . This means that the cost of the minimum cut is at most the edge-connectivity.
We will discuss two algorithms for finding the edge-connectivity of a graph. One is a simple reduction to the maximum flow problem, and runs in time . The other is a surprising simple randomized algorithm based on edge-contractions — the surprising part is the fact that it correctly solves the problem, because it seems to hardly be doing any work. We will discuss a simple implementation of the edge-contraction algorithm, which is already better than the reduction to maximum flow. A more refined analysis and implementation gives a running time .
1.1. Reduction to Maximum Flow
Consider the following algorithm:
- Input: undirected graph
- let be a vertex in (the choice does not matter)
- define for every
- for each
- solve the min cut problem in the network , and let be the cut of minimum capacity
- output the cut of minimum cost
The algorithm uses minimum cut computations in networks, each of which can be solved by a maximum flow computation. Since each network can have a maximum flow of cost at most , and all capacities are integers, the Ford-Fulkerson algorithm finds each maximum flow in time and so the overall running time is .
To see that the algorithm finds the global min cut, let be edge-connectivity of the graph, be a set of edges whose removal disconnects the graph, and let be the connected component containing in the disconnected graph resulting from the removal of the edges in . So is a global minimum cut of cost at most (indeed, exactly ), and it contains .
In at least one iteration, the algorithm constructs a network in which , which means that is a valid cut, of capacity , for the network, and so when the algorithm finds a minimum capacity cut in the network it must find a cut of capacity at most (indeed, exactly ). This means that, for at least one , the cut is also an optimal global min-cut.
1.2. The Edge-Contraction Algorithm
Our next algorithm is due to David Karger, and it involves a rather surprising application of random choices.
The algorithm uses the operation of edge-contraction, which is an operation defined over multi-graphs, that is graphs that can have multiple edges between a given pair of vertices or, equivalently, graphs whose edges have a positive integer weight.
If, in an undirected graph we contract an edge , the effect is that the edge is deleted, and the vertices and are removed, and replaced by a new vertex, which we may call ; all other edges of the graph remain, and all the edges that were incident on or become incident on the new vertex . If had edges connecting it to , and had edges connecting it to , then in the new graph there will be edges between and .
For example, if we contract the edge in the 3-cube we have the following graph.
And if, in the resulting graph, we contract the edge , we have the following graph.
Note that, after the two contractions, we now have two edges between the “macro-vertices” and .
The basic iteration of Karger’s algorithm is the following:
- while there are vertices in the graph
- pick a random edge and contract it
- output the set of vertices of the original graph that have been contracted into one of the two final macro-vertices.
One important point is that, in the randomized step, we sample uniformly at random among the edges of the multi-set of edges of the current multi-graph. So if there are 6 edges between the vertices and 2 edges between the vertices , then a contraction of is three times more likely than a contraction of .
The algorithm seems to pretty much pick a subset of the vertices at random. How can we hope to find an optimal cut with such a simple approach?
(In the analysis we will assume that the graph is connected. if the graph has two connected components, then the algorithm converges to the optimal min-cut of cost zero. If there are three or more connected components, the algorithm will discover them when it runs out of edges to sample, In the simplified pseudocode above we omitted the code to handle this exception.)
The first observation is that, if we fix for reference an optimal global min cut of cost , and if it so happens that there is never a step in which we contract one of the edges that connect with the rest of the graph then, at the last step, the two macro-vertices will indeed be and and the algorithm will have correctly discovered an optimal solution.
But how likely is it that the edges of the optimal solution are never chosen to be contracted at any iteration?
The key observation in the analysis is that if we are given in input a (multi-)graph whose edge-connectivity is , then it must be the case that every vertex has degree , where the degree of a vertex in a graph or multigraph is the number of edges that have that vertex as an endpoint. This is because if we had a vertex of degree then we could disconnect the graph by removing all the edges incident on that vertex, and this would contradict the -edge-connectivity of the graph.
But if every vertex has degree , then
and, since each edge has probability of being sampled, the probability that, at the first step, we sample one of the edges that cross the cut is only
What about the second step, and the third step, and so on?
Suppose that we were lucky at the first step and that we did not select any of the edges that cross . Then, after the contraction of the first step we are left with a graph that has vertices. The next observation is that this new graph has still edge-connectivity because the cut defined by is still well defined. If the edge-connectivity is still , we can repeat the previous reasoning, and conclude that the probability that we select one of the edges that cross is at most
And now we see how to reason in general. If we did not select any of the edges that cross at any of the first step step, then the probability that we select one of those edges at step is at most
So what is the probability that we never select any of those edges at any step, those ending up with the optimal solution ? If we write to denote the event that “at step , the algorithm samples an edge which does not cross ,” then
If we write , the product in the last line is
which simplifies to
Now, suppose that we repeat the basic algorithm times. Then the probability that it does not find a solution in any of the attempts is at most
So, for example, if we repeat the basic iteration times, then the probability that we do not find an optimal solution is at most
(where we used the fact that ), which is an extremely small probability.
One iteration of Karger’s algorithm can be implemented in time , so overall we have an algorithm of running time which has probability at least of finding an optimal solution.