|
I've spent most of my adult life making programs better: easier to use, more powerful, smaller, and cheaper. The most satisfying improvements tend to be the most quantifiable; it feels great to replace thousands of lines of code with dozens of lines. And the most quantifiable improvement of all is a program speedup. After you spend hours or days working on your program, you can state precisely how much faster it runs. This column describes some of the largest speedups I have seen. We'll start with a simple program for an important problem in network design. We'll then apply four concrete speedups, each involving just a few lines of code. The result is a speedup factor of about 30,000 for typical problem sizes. Along the way, we'll see that hardware changes over the past decade or so give us another speedup factor of about 1,000. And finally, we'll go hog-wild and see speedups of, well, billions and billions.
The Problem
You face this problem when you schedule the half-dozen errands you need to run on a Saturday morning: proceeding from task to task to task with the least hassle. Mathematicians call it the "Traveling Salesman Problem": find the shortest tour that visits n cities. This elegant problem has fascinated mathematicians for more than a century, yet it also has numerous practical applications. As this column went to press, the Web site http://www.fgi.net/LincolnLegalPapers/natsig.htm showed the tour Abraham Lincoln rode in 1850 through 14 county seats on the Eighth Judicial Circuit in Illinois. SONET rings (and any other rings, for that matter) can be viewed as Traveling Salesman tours. Ethernets without hubs call for finding good (but not necessarily optimal) Traveling Salesman paths. (A tour of n cities has n edges, while the path uses only n-1 edges by not returning to the start city.) Geometric Traveling Salesman algorithms are used to schedule mechanical devices such as drills, plotters, and solderers. Here is the shortest tour through nine points in the plane.
Our programs
will assume that the n cities are numbered 0 through n-1,
as in the diagram above. We determine the distance between two cities
with the function (The number
1 in
typedef double Dist; We will examine possible variations shortly. We can represent a tour by a permutation of the integer city numbers. In our diagram, for instance, the optimal tour could be represented by the permutation 327815640, or by 17 equivalent permutations including 187230465. A program will represent the permutation by the array p:
int p[MAXN]; The vector
The single element 0 has just one permutation. The two elements 0 and 1 have two permutations: 01 and 10. The three elements 0, 1, and 2 have six permutations: 012, 021, 102, 120, 201, and 210. In general, n elements have n! (pronounced "n factorial") permutations, where
n! = nx(n-1)x...x3x2x1 This function grows faster than exponentially:
10! 3.62x106, 20! 2.43x1018, and 30! 2.65x1032 We will soon see practical interpretations of these unimaginably huge quantities.
Algorithm 1: The Easiest Program
Our first program is thorough if not speedy. It will look at all n! permutations in the vector p, calculate the cost of each, and store the cheapest tour. We initialize Algorithm 1 with this code:
void solve1() This The n!
permutations are constructed recursively by the
void search1(int
m) When Exercise
1. Trace the contents of the vector The void swap(int
i, int j) The void check1() When a shortest tour is found, its length and permutation are stored in these variables: Dist minsum; This code compares the current sum with the best so far and saves the current tour if it is better: void save(Dist
sum) Algorithm 1 takes about 40 lines of C code. On a 200MHz Mips R4400 processor, it finds an eight-city tour in 0.6 seconds, a nine-city tour in 6.3 seconds, and a ten-city tour in 70 seconds. At this rate, it would take a few hours to solve a twelve-city problem, and almost a year to solve a fifteen-city problem--time for some speedups! Exercise 2. Modify the C code to compute the shortest path through n cities (that is, do not include the cost of returning from the last city to the first city).
Algorithm 2: A Fixed Point
Our first speedup
will decrease the number of permutations we examine from n!
to (n-1)!, for a speedup factor of n. Algorithm 1
inspects all n! permutations. Algorithm 2 will fix one city
at the high end of the p array then inspect the resulting (n-1)!
permutations. This algorithm will find the optimal tour (twice,
in fact: once forwards, once backwards), but it avoids unnecessary
work. The code for
void solve2() The diagram on page 59 shows the run time of algorithms in seconds on the 200MHz Mips R4400. The inputs consist of n points uniformly distributed over a square. We've seen only Algorithms 1 and 2 so far; we'll see the rest shortly. I stopped timing each algorithm when its run time exceeded a minute (or when I suspected as much by extrapolation). As predicted, Algorithm 2 is a factor of n faster than Algorithm 1. This lets us solve a problem of size one larger in roughly the same amount of time. Exercise 3. How would you apply this speedup to the problem of computing paths (rather than tours) described in Exercise 2?
Algorithm 3: Sum Times Not
As Algorithm
2 visits each of the (n-1)! permutations, it computes the
cost of the tour from scratch. We can avoid that redundant computation
by keeping a new parameter for the sum of the edges included so
far in the partial tour. At each recursive call, the
void search3(int
m, Dist sum) We initialize
the cost in the
search3(n-1,
ZERO); The
void check3(Dist
sum) Exercise 4. Mathematically analyze the cost of Algorithm 3 and compare your model with the table of run times.
Algorithm 4: Cheaper Distances
When the cities are geometric points, each distance function involves an expensive square-root operation. Algorithm 3 achieves a speedup by reducing the number of distance calculations. Algorithm 4 achieves further speedup by calculating all possible distances in advance, then replacing calls to the expensive distance function with a cheap table lookup. We will store the distances in the array:
Dist distarr[MAXN][MAXN];
void initdist4() The function
In functions
#define dist4(i,
j) (distarr[i][j]) Exercise 5. Use a cost model like the ones described in the April 1997 column ("Cost Models For Sorting," p. 65) to predict the result of this speedup. In our program, this gives a speedup factor of five.
A Detour: Even Cheaper Distances?
A venerable
speedup replaces floating-point operations with integer operations.
For this problem, we preserve the numeric precision by multiplying
the floating-point distances by a scaling factor, such as 1,000
for 16-bit The first six lines in Figure 2a were generated by the same 16-bit object code; the final line was compiled with a 32-bit compiler. On old machines, floating-point operations were much more expensive than integer operations, and converting to integers would have provided an order-of-magnitude speedup. On modern machines, though, the operation costs don't vary much, so it is not worth scaling floating-point values to integers. This table does provide a simple benchmark for programs that perform control flow and addition. This little experiment indicates that since the mid-1980s, integer hardware has sped up by a factor of about 200, while floating-point hardware has sped up by a factor of about 2,000.
Algorithm 5: Pruning The Search
"Don't keep
doing what doesn't work" is good advice for programs as well as
for people. If the sum we have computed so far already exceeds the
best sum to date, then we should waste no more time examining this
permutation. Adding new cities will not shorten the tour. We can
therefore add a single line of code to
if (sum > minsum) return; Algorithms 1 through 4 have a running time almost independent of their input graphs. Algorithm 5, though, is very sensitive to its input. Figure 2a presents data for inputs distributed uniformly over a square. Figure 2b shows the run time of Algorithm 5 for inputs drawn uniformly from a square, for geometric inputs distributed randomly around the perimeter of a circle, and for random symmetric distance matrices. This pruned search is fairly efficient on uniform data and even more efficient on the two other kinds of inputs. I spent about half an hour coding Algorithm 1 and several hours later that night hacking my way to Algorithm 5. At the price of a few dozen lines of code, Algorithm 5 is significantly faster than Algorithm 1. While the initial program would take almost a year to solve a problem of size n=15, the final code can do the job in about 20 minutes.
And Faster Yet
In the late 1970s, Christos Papadimitriou of Berkeley kindly warned me that "the Traveling Salesman Problem is not just a problem, it's an addiction." After a pleasant night's coding, the bug bit me again, and I spent large chunks of the next week writing TSP code. The result was Algorithms 6, 7, and 8, whose run times are summarized in Figure 2c. The run times are for finding tours among the first n cities of a single set of 30 points, distributed uniformly over a square. Exercise 6. Explain why adding a point sometimes decreases the run time.
Algorithm 8 can solve a problem of size n=30 in a couple of minutes. The diagram below, for instance, shows a set of 30 points and an optimal tour through them.
Algorithm 1 would require about 160 billion billion [sic] years to compute this tour, or about ten billion times the age of the universe. Don't hold your breath. This column does not have enough space for details, so we'll just skim the final three programs. Algorithm 6 uses the minimum spanning tree (described in the October 1996 column, "A Network Design Algorithm," p. 89) as a lower bound on the length of the tour through the points not yet fixed in the permutation. We therefore prune the search with code like
if (sum + mstdist() > minsum) return; This took a few dozen lines of code. When I observed that the same minimum spanning trees were frequently recomputed, I stored their values in a hash table; the result was Algorithm 7, at the cost of a couple dozen lines of code. Algorithm 8 orders the nodes to search nearest cities first (using about a dozen lines of code). This made only a slight speedup over Algorithm 7 on points distributed uniformly over a square, but it made a huge difference on other kinds of data sets. Figure 3 shows the run time of Algorithm 8 on the three types of inputs described earlier.
Principles
Speedups. The past decade has seen incredible speedups in computer hardware: two or three orders of magnitude. Really monumental speedups, though, come from algorithms. Algorithmic Tricks. We started with the simple recursive Algorithm 1. Avoiding redundant computations led to Algorithms 2, 3, 4, and 7. Pruning the search led to Algorithms 5, 6, and 8; the resulting search algorithms are typically efficient but unpredictable. Faster And Faster, But Not Fast Enough. Algorithm 8 is a lot faster than Algorithm 1, but it still is exponential. We'll never be able to use it to solve really large problems.
Further Reading
I have only
scratched the surface; one could write a book on the Traveling Salesman
Problem. In fact, four have. In 1985, John Wiley & Sons (New
York) published The Traveling Salesman Problem by E.L. Lawler,
J.K. Lenstra, A.H.G. Rinnooy Kan, and D.B. Shmoys. It is an invaluable
reference for this fundamental problem.
Solutions To Selected Exercises
2. In
the Dist sum = 0; 3. Because the endpoints have special significance, we cannot immediately apply this speedup to computing paths. 6. Although the cost generally grows as the problem size increases, adding a new point sometimes allows more effective pruning. Jon
Bentley is a Member of Technical Staff in the Computing Science
Research Center at Bell Labs.
Figure 2
This article first appeared in the June 1997 issue of UNIX Review.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Home | Top | Editor's Choice
|