In which we show the equivalence of metric and general Steiner tree, give a 2-approximate algorithm for both problems, and begin to talk about TSP
1. Approximating the Metric Steiner Tree Problem
In the previous lecture we defined the Metric Steiner Tree Problem as follows: the input is a set of points, where
is a set of required points and
is a set of optional points, and a symmetric distance function
that associates a non-negative distance
to every pair of points. We restrict the problem to the case in which
satisfies the triangle inequality, that is,
In such a case, is called a (semi-)metric, hence the name of the problem.
The goal is to find a tree , where
is any set
of points that includes all of the required points, and possibly some of the optional points, such that the cost
of the tree is minimized.
In the previous lecture we suggested the following algorithm: completely disregard the optional vertices in , and find a minimum spanning tree of the weighted graph that has vertex set
and edge weights defined by
.
Now we prove that this algorithm is 2-approximate, that is, it finds a solution whose cost is at most twice the optimal cost.
To do so, we prove the following.
Lemma 1 Let
be an instance of the metric Steiner tree problem, and
be a Steiner tree with
.
Then there is a tree
which spans the vertices in
and only the vertices in
such that
In particular, applying the Lemma to the optimal Steiner tree we see that there is a spanning tree of whose cost is at most twice the cost of the optimal Steiner tree. This also means that the minimal spanning tree of
also has cost at most twice the cost of the optimal Steiner tree.
\bigskip
Proof: } Consider a DFS traversal of , that is, a sequence
listing the vertices of in the order in which they are considered during a DFS, including each time we return to a vertex at the end of each recursive call. The sequence describes a cycle over the elements of
whose total length
is precisely
, because the cycle uses each edge of the tree precisely twice.
Let now be the sequence obtained from
by removing the vertices in
and keeping only the first occurrent of each vertex in
.
Then is a path that includes all the vertices of
, and no other vertex, and its cost
is at most the cost of the cycle
(here we are using the triangle inequality), and so it is at most
.
But now note that , being a path, is also a tree, and so we can take
to be tree
where
is the edge set
.
For example, if we have an instance in which ,
, and the distance function
assigns distance 1 to the points connected by an edge in the graph below, and distance 2 otherwise

Then the following is a Steiner tree for our input whose cost is 8:

We use the argument in the proof of the Lemma to show that there is a Spanning tree of of cost at most 16. (In fact we will do better.)
The order in which we visit vertices in a DFS of is
. If we consider it as a loop that starts at
and goes back to
after touching all vertices, some vertices more than once, then the loop has cost 16, because it uses every edge exactly twice.
Now we note that if we take the DFS traversal, and we skip all the optional vertices and all the vertices previously visited, we obtain an order in which to visit all the required vertices, and no other vertex. In the example the order is .

Because this path was obtained by “shortcutting” a path of cost at most twice the cost of , and because we have the triangle inequality, the path that we find has also cost at most twice that of
. In our example, the cost is just 10. Since a path is, in particular, a tree, we have found a spanning tree of
whose cost is at most twice the cost of
.
The factor of 2 in the lemma cannot be improved, because there are instances of the Metric Steiner Tree problem in which the cost of the minimum spanning tree of is, in fact, arbitrarily close to twice the cost of the minimum steiner tree.
Consider an instance in which ,
,
for
, and
for all
. That is, consider an instance in which the required points are all at distance two from each other, but they are all at distance one from the unique optional point. Then the minimum Steiner tree has
as a root and the nodes
as leaves, and it has cost
, but the minimum spanning tree of
has cost
, because it is a tree with
nodes and
edges, and each edge is of cost 2.
2. Metric versus General Steiner Tree
The General Steiner Tree problem is like the Metric Steiner Tree problem, but we allow arbitrary distance functions.
In this case, it is not true any more that a minimum spanning tree of gives a good approximation: consider the case in which
,
,
,
and
. Then the minimum spanning tree of
has cost
while the minimum Steiner tree has cost
.
We can show, however, that our 2-approximation algorithm for Metric Steiner Tree can be turned, with some care, into a 2-approximation algorithm for General Steiner Tree.
Lemma 2 For every
, if there is a polynomial time
-approximate algorithm for Metric Steiner Tree, then there is a polynomial time
-approximate algorithm for General Steiner Tree.
Proof: Suppose that we have a polynomial-time -approximate algorithm
for Metric Steiner Tree and that we are given in input an instance
of General Steiner Tree. We show how to find, in polynomial time, a
-approximate solution for
.
For every two points , let
be the length of a shortest path from
to
in the weighted graph of vertex set
of weights
. Note that
is a distance function that satisfies the triangle inequality, because for every three points
it must be the case that the length of the shortest path from
to
cannot be any more than the length of the shortest path from
to
plus the length of the shortest path from
to
.
This means that is an instance of Metric Steiner Tree, and we can apply algorithm
to it, and find a tree
of cost
Now notice that, for every pair of points, , and so if
is the optimal tree of our original input
we have
So putting all together we have
Now, from , construct a graph
by replacing each edge
by the shortest path from
to
according to
. By our construction we have
Note also that is a connected graph.
The reason why we have an inequality instead of an equality is that certain edges of might belong to more than one shortest path, so they are counted only once on the left-hand side.
Finally, take a minimum spanning tree of
according to the weights
. Now
is a valid Steiner tree, and we have
3. The Traveling Salesman Problem
In the Traveling Salesman Problem (abbreviated TSP) we are given a set of points and a symmetric distance function
. The goal is to find a cycle that reaches all points in
and whose total length is as short as possible.
For example, a shuttle driver that picks up seven people at SFO and needs to take them to their home and then go back to the airport faces a TSP instance in which includes eight points (the seven home addresses and the airport), and the distance function is the driving time between two places. A DHL van driver who has to make a series of delivery and then go back to the warehouse has a similar problem. Indeed TSP is a basic model for several concrete problems, and it one of the most well studied problems in combinatorial optimization.
We can distinguish four versions of TSP. One distinction is whether we work with the Metric version, in which we restrict ourselves to inputs in which satisfies the triangle inequality, or with the General version in which
can be arbitrary. Another distinction is whether we want to cycle to reach every point at least once, or whether we are only looking for solutions in which every point is reached exactly once.
It can be proved that, unless , no approximation is possible for the General TSP without repetitions.
Next time we will show that the other three versions (Metric TSP without repetitions, General TSP with repetitions, and Metric TSP with repetitions), are all equivalent from the point of view of approximation.

Leave a comment
Comments feed for this article