The Pubs Galore site states: "We have the largest list of pubs in the UK kept up to date by our dedicated members." The list is indeed extensive, totaling more than 80,000 locations. Sadly, many of the listed pubs are now permanently closed, but, on April 16, 2018, the site reports that "Pubs Galore currently has 49546 open pubs listed". That's a lot of pubs.
We collected our data set from Pubs Galore on January 12, 2017. The full set contained 49,704 stops. From this set, we had to remove the following 17 pubs, since we were unable to obtain Google Maps walking distances to and from these locations (in early 2017).
Ten of these eliminated pubs are located in the Isles of Scilly, southwest of Cornwall. If you have a sailboat, be sure to pick them up after your pint in The Cable Station in Penzance.
We were left with 49,687 locations, covering England (40,970, up from the 22,044 English pubs included in the tour from 2016), the Isle of Man (98, up from 19), Northern Ireland (981, up from 240), Scotland (4,580, up from 1,186), and Wales (3,058, up from 1,238). With this much data, there are bound to be a few errors, such as unmarked pub closings and updates that have occurred since January 2017. But the Pubs Galore team has created a very accurate data set and we are in their debt for providing the material for an ultimate UK pub crawl.
A good way to view the data is to have a look at a Google Maps drawing giving a marker for every pub location. That's a lot of markers to crowd onto a map of the UK, but if you zoom in you can see the distribution of points. There are sparse regions in the forest parks of northern England and in large parts of Scotland, but the big picture is that there are a lot of pubs in the UK.
If you are unable to view the interactive map, here is a large screen shot showing the pub locations, color coded by country.
Having the list of pub names adds spice to the TSP challenge, but when it comes to the mathematics, all we need are the latitude and longitude coordinates for the stops. This is the information that is used to query Google Maps for walking distances and for computing the geometric distance between pairs of pubs.
Here is a link to a text file with the latitude and longitude (in decimal form) for all 49,687 pub locations. We must remark that 468 of the points in the data set are duplicates, where Pubs Galore has two pubs listed for the same location (possibly sharing a building site). We kept these in the data set to have the full list of pubs, but, for the traveling salesman problem, this means we have really only solved a smaller 49,219-point problem.
In the pubs problem, we assume the distance to walk from the The Fiddler's Elbow to The Bald Face Stag is the same as the distance to walk back from The Stag to The Elbow. This is quite reasonable for walking, although for inner-city travel it sometimes does not hold for driving (due to one-way streets).
We also make the assumption that the walking distance between two pubs is no shorter than the geometric distance to fly from one pub to the other. To be precise, we define the distance from pub A to pub B to be the maximum of the geometric distance between the latitude and longitude coordinates for A and B and the walking distance from A to B provided by Google Maps.
In our computation, the geometric distance between two pubs is specified by an approximation to the great circle distance on the Earth (treating the Earth as a ball). This is a variation of the TSPLIB GEO-norm, adjusted to handle decimal degrees and scaled to provide the distance in meters rather than in kilometers. The distance function is called the GEOM-norm; a computer code to compute the GEOM distance between two points can be found on the World TSP page (the GEOM distance is the great circle distance rounded to the nearest integer).
The UK49687 geometric data set, in TSPLIB format, is given here. This data set allows the geometric distances to be computed, for example, by the Concorde TSP code.
To fully specify the problem we solved, we need also to report the Google distances we used, since Google's algorithms adjust their routes over time, taking into consideration new construction and better data sets for road networks.
Here is a gzipped list of the 2,214,453 pairs of points for which we queried Google Maps. Each line gives the indices of two points, using the order specified in the TSPLIB data set, where the points are numbered from 0 up to 49,686. Following the pair of numbers, on each line, is the reported walking distance in meters.
To reproduce the UK49687 problem, use the maximum of the GEOM distance and the walking distance for each of the 2,214,453 pairs in the file, and use the GEOM distance for each of the remaining 1,232,159,688 pairs. We did not need to query Google for these additional pairs, since our final tour does not use any of these direct legs. (Since the defined distance is the maximum, having the Google distance can only increase the length of any tour that uses one of the 1,232,159,688 pairs.)
Please be warned that the travel data was obtained in early 2017. If you need to hit a few UK pubs, be sure to check out Google Maps for up-to-date distances and directions.
A short mathematical way to express the optimal tour is as a permutation of the integers from 0 up to 49,686, specifying the indices of the pubs in the tour order. Here is the optimal tour in this format.
If you are in the hunt for the 49,687¥, you must use the data sets I've described above to specify the length of the legs in the tour. Send your route to me (bico@uwaterloo.ca) in the format of a list of indices (from 0 up to 49,686) as in the file uk49687_optimal.txt. To win the prize, the tour will need to measure length less than 63,739,687 as calculated on my Linux servers.
Please note that the GEOM distance between each a pair of pubs is an integer number of meters, not a fractional value. The computer code for computing the GEOM distance is given on the World TSP page. And, again, the distance for the 2,214,453 pairs of pubs for which we obtained Google walking routes is the maximum of the GEOM distance and the distance recorded in the pairs file.