In which we prove the equivalence of three versions of the Traveling Salesman Problem, we provide a 2-approximate algorithm, we review the notion of Eulerian cycle, we think of the TSP as the problem of finding a minimum-cost connected Eulerian graph, and we revisit the 2-approximate algorithm from this perspective.
1. Variations of the Traveling Salesman Problem
Recall that an input of the Traveling Salesman Problem is a set of points and a non-negative, symmetric, distance function such that for every .
The goal is to find a cycle that reaches every vertex and that has minimal total length .
There are different versions of this problem depending on whether we require to satisfy the triangle inequality or not, and whether we allow the loop to pass through the same point more than once.
- General TSP without repetitions (General TSP-NR): we allow arbitrary symmetric distance functions, and we require the solution to be a cycle that contains every point exactly once;
- General TSP with repetitions (General TSP-R): we allow arbitrary symmetric distance functions, and we allow all cycles as an admissible solution, even those that contain some point more than once;
- Metric TSP without repetitions (Metric TSP-NR): we only allow inputs in which the distance function satisfies the triangle inequality, that is
and we only allow solutions in which each point is reached exactly once;
- Metric TSP with repetitions (Metric TSP-R): we only allow inputs in which the distance function satisfies the triangle inequality, and we allow all cycles as admissible solutions, even those that contain some point more than once.
If we allow arbitrary distance functions, and we require the solution to be a cycle that reaches every point exactly once, then we have a problem for which no kind of efficient approximation is possible.
Fact 1 If then there is no polynomial-time constant-factor approximation algorithm for General TSP-NR.
More generally, if there is a function such that can be computable in time polynomial in (for example, ), and a polynomial time algorithm that, on input an instance of General TSP-NR with points finds a solution of cost at most times the optimum, then .
The other three versions, however, are completely equivalent from the point of view of approximation and, as we will see, can be efficiently approximated reasonably well.
Lemma 2 For every , there is a polynomial time -approximate algorithm for Metric TSP-NR if and only if there is a polynomial time -approximate approximation algorithm for Metric TSP-R. In particular:
- If is a Metric TSP input, then the cost of the optimum is the same whether or not we allow repetitions.
- Every -approximate algorithm for Metric TSP-NR is also a -approximate algorithm for Metric TSP-R.
- Every -approximate algorithm for Metric TSP-R can be turned into a -approximate algorithm for Metric TSP-NR after adding a linear-time post-processing step.
Proof: Let be the cost of an optimal solution for among all solutions with or without repetitions, and be the cost of an optimal solution for among all solutions without repetitions. Then clearly in the former case we are minimizing over a larger set of possible solutions, and so
Consider now any cycle, possibly with repetitions, . Create a new cycle from by removing from all the repeated occurrences of any vertex. (With the special case of which is repeated at the end.) For example, the cycle becomes . Because of the triangle inequality, the total length of is at most the total length of , and is a cycle with no repetitions. If we apply the above process by taking to be the optimal solution of allowing repetitions, we see that
and so we have proved our first claim that .
Regarding the second claim, suppose that we have a -approximate algorithm for Metric TSP-NR. Then, given an input the algorithm finds a cycle with no repetitions such that
but is also an admissible solution for the problem Metric TSP-R, and
and so our algorithm is also a -approximate algorithm for .
To prove the third claim, suppose we have a -approximate algorithm for Metric TSP-R. Then, given an input the algorithm finds a cycle , possibly with repetitions, such that
Now, convert to a solution that has no repetitions and such that as described above, and output the solution . We have just described a -approximate algorithm for Metric TSP-NR, because
Lemma 3 For every , there is a polynomial time -approximate algorithm for Metric TSP-NR if and only if there is a polynomial time -approximate approximation algorithm for General TSP-R. In particular:
- Every -approximate algorithm for General TSP-R is also a -approximate algorithm for Metric TSP-R.
- Every -approximate algorithm for Metric TSP-R can be turned into a -approximate algorithm for General TSP-R after adding polynomial-time pre-processing and post-processing steps.
Proof: The first claim just follows from the fact that General and Metric TSP-R are the same problem, except that in the general problem we allow a larger set of admissible inputs.
The proof of the second claim is similar to the proof that an approximation algorithm for Metric Steiner Tree can be converted to an approximation algorithm for General Steiner Tree.
Consider the following algorithm: on input an instance of General TSP-R, we compute the new distance function such that is the length of a shortest path from to in a weighted graph that has vertex set and weights . Note that is a distance function that satisfies the triangle inequality. We also compute the shortest path between any pair .
We then pass the input to our -approximate algorithm for Metric TSP-R, and find a cycle such that
Note that, for every pair of points , we have and so this implies that
Finally, we construct a cycle by replacing each transition in by the shortest path from to according to . Because of the definition of , we have
and, combining the inequalities we have proved so far,
meaning that the algorithm that we have described is a -approximate algorithm for General TSP-R.
2. A 2-approximate Algorithm
When discussing approximation algorithms for the Minimum Steiner Tree problem in the last lecture, we proved (without stating it explicitly) the following result.
where and .
The proof is simply to consider a DFS visit of the tree, starting at the root; we define to be the order in which vertices are visited, counting both the beginning of the recursive call in which they are reached, and also the time when we return at a vertex after the end of a recursive call originated at the vertex.
For example, revisiting an example from the last lecture, from the tree
We get the cycle , in which every point is visited at least once (indeed, a number of times equal to its degree in the tree) and every edge is traversed precisely twice.
Theorem 5 There is a polynomial-time 2-approximate algorithm for General TSP-R. (And hence for Metric TSP-NR and Metric TSP-R.)
Proof: The algorithm is very simple: on input a we find a minimum spanning tree of the weighted graph with vertex set and weights , and then we find the cycle of cost as in Lemma 4.
It remains to prove that , which will then imply that we have found a solution whose cost is , and that our algorithm is 2-approximate.
Let be an optimal solution for , and consider the set of edges which are used in the cycle, then is a connected graph; take any spanning tree of the graph , then
because uses a subset of the edges of . On the other hand, is a spanning tree, and so
So we have
from which it follows that our algorithm achieves a 2-approximation.
3. Eulerian Cycles
In this section we review the definition of Eulerian cycle. In the next section, we will use this notion to give a new view of the 2-approximate algorithm of the previous section, and we willnote that this new perspective suggests a potentially better algorithm, that we will analyze in the next lecture.
In this section, it will be convenient to work with multi-graphs instead of graphs. In an undirected multi-graph , is a multi-set of pairs of vertices, that is, the same pair can appear more than once in . Graphically, we can represent a multi-graph in which there are edges between and by drawing parallel edges between and . The degree of a vertex in a multigraph is the number of edges of the graph that have that vertex as an endpoint, counting multiplicities.
Definition 6 (Eulerian Cycle) An Eulerian cycle in a multi-graph is a cycle such that the number of edges in is equal to the number of times is used in the cycle.
In a standard graph, a Eulerian cycle is a cycle that uses every edge of the graph exactly once.
Theorem 7 A multi-graph has an Eulerian cycle if and only if every vertex has even degree and the vertices of positive degree are connected. Furthermore, there is a polynomial time algorithm that, on input a connected graph in which every vertex has even degree, outputs an Eulerian cycle.
Proof: If is Eulerian, then the cycle gives a way to go from every vertex to every other vertex, except the vertices of zero degree. Furthermore, if a vertex appears times in the cycle, then there are edges involving in the cycle (because, each time is reached, there is an edge used to reach and one to leave from ); since the cycle contains all the edges of the graph, it follows that has degree , thus all vertices have even degree. This shows that if a graph contains an Eulerian cycle, then every vertex has even degree and all the vertices of non-zero degree are connected.
For the other direction, we will prove by induction a slightly stronger statement, that is we will prove that if is a graph in which every vertex has even degree, then every connected component of with more than one vertex has a Eulerian cycle. We will proceed by induction on the number of edges.
If there are zero edges, then every connected component has only one vertex and so there is nothing to prove. This is the base case of the induction.
If we have a graph with a non-empty set of edges and in which every vertex has even degree, then let , \ldots, be the connected components of that have at least two vertices. If , then every connected component has strictly fewer vertices than , and so we can apply the inductive hypothesis and find Eulerian cycles in each of ,\ldots,.
It remains to consider the case in which the set of vertices of positive degree of are all in the same connected component. Let be the restriction of to the vertices of . Since every vertex of has degree , there must be a cycle in . This is because if a connected graph with vertices has no cycles, then it is a tree, and so it has edges; but in a graph in which there are vertices and every vertex has degree , the number of edges is at least . Let be a simple cycle (that is, a cycle with no vertices repeated) in , and let be the graph obtained from by removing the edges of . Since we have removed two edges from every vertex, we have that is still a graph in which every vertex has even degree. Since has fewer edges than we can apply the induction hypothesis, and find a Eulerian cycle in each non-trivial connected component (a connected component is trivial if it contains only an isolated vertex of degree zero) of . We can then patch together these Eulerian cycles with as follows: we traverse , starting from any vertex; the first time we reach one of the non-trivial connected components of , we stop traversing , and we traverse the Eulerian cycle of the component, then continue on , until we reach for the first time one of the non-trivial connected components of that we haven’t traversed yet, and so on. This describes a Eulerian path into all of
Finally, we note that this inductive argument can be converted into a recursive algorithm. The main computation is to find the connected components of a graph, which can be done in linear time, and to find a cycle in a given graph, which can also be done in linear time using a DFS. Hence the algorithm runs in polynomial time.
4. Eulerian Cycles and TSP Approximation
Let be an instance of General TSP-R. Suppose that is a connected multi-graph with vertex set that admits an Eulerian cycle . Then the Eulerian cycle is also an admissible solution for the TSP problem, and its cost is . Conversely, if we take any cycle which is a TSP-R solution for the input , and we let be the multiset of edges used by the cycle (if the cycle uses the same edge more than once, we put as many copies of the edge in as the number of times it appears in the cycle), then we obtain a graph which is connected and which admits an Eulerian cycle.
In other words, we can think of the General TSP-R as the following problem: given a set of points and a symmetric distance function , find the multi-set of edges such that the graph is connected and Eulerian, and such that is minimized.
The approach that we took in our 2-approximate algorithm was to start from a spanning tree, which is guaranteed to be connected, and then take every edge of the spanning tree twice, which guarantees that every vertex has even degree, and hence that an Eulerian cycle exists. The reader should verify that if we take a tree, double all the edges, and then apply to the resulting multigraph the algorithm of Theorem 7, we get the same cycle as the one obtained by following the order in which the vertices of the tree are traversed in a DFS, as in the proof of Lemma 4.
From this point of view, the 2-approximate algorithm seems rather wasteful: once we have a spanning tree, our goal is to add edges so that we obtain an Eulerian graph in which every vertex has even degree. Doubling every edge certainly works, but it is a rather “brute force” approach: for example if a vertex has degree 11 in the tree, we are going to add another 11 edges incident on that vertex, while we could have “fixed” the degree of that vertex by just adding one more edge. We will see next time that there is a way to implement this intuition and to improve the factor of 2 approximation.