Take a look up on a clear night. It's comforting to see a host of stars in familar positions. A map. But it's 2-dimensional, positioning stars as points on a sphere centered at Earth. Great for celestical navigation, but for 3-dimensional travel we need also the distance out to each star, allowing us to place it in 3-space with xyz coordinates.
Such distance measurements are tricky. We can't just send out a ship and check its odometer. Fortunately, for 2,057,050 stars in the TGAS (Tycho-Gaia Astrometric Solution) subcollection, parallax observations in Gaia DR1 permit astronomers to build (at least rough) estimates. The process involves certain assumptions on the distribution of stars in the galaxy. This is all well beyond my knowledge of astronomy, but it's layed out in the following highly-cited research paper.
We use these Astraatmadija and Bailer-Jones distances to compute coordinate positions for each of the TGAS stars. If you are interested in details of our data extraction, you can find them here.
The resulting TGAS data set is plenty large, but it misses many of the brightest stars, including those closest to Earth; see the discussion at the end of Section 5.1 of the Gaia release paper and also K. Jardine's blog post TGAS Limitations.
For this reason, we added 22,424 stars from the Hipparcos catalog to the TGAS list. These are Hipparcos stars recorded in the HYG database that are not placed on the large outer sphere (of radius 100,000 parsecs). This additional collection includes the sun, Proxima Centuri, and other crowd favorites.
For stars in both the TGAS and HYG data sets, geometric coordinates can differ (due to astronomical data gathering and to assumptions made on galatic distributions). In our TSP work we use the TGAS estimates.
From the full set, we removed 4 stars with outlier positions
• Tycho2 ID 829-566-2 (distance 99901.6 parsecs)
• Hipparcos ID 47696 (distance 18420.5 parsecs)
• Hipparcos ID 50305 (distance 12067.0 parsecs)
• Tycho2 ID 6431-1773-2 (distance 9009.3 parsecs)
The resulting TGAS+HYG database has 2,079,470 stars. That is one short of our 2,079,471 total. The additional point is a duplicate of our sun, since it was convienent to have the first and last stop in our tour as separate points in the collection.
We scaled the coordinates to units of 1/10th parsecs. The resulting xyz triples are listed in the following two gzipped files, one star per line.
• gaia2079471.xyz.txt.gz, list of the coordinates,
• gaia2079471.tsp.gz, the coordinates in TSPLIB format.
To match the points in the TSP data to the original astronomy databases, in the following gzipped file we list for each entry the corresponding Hipparcos or Tycho-2 ID, together with the xyz-coordinates used in the TSP model.
Our tour for this TSP instance is given in each of the following three gzipped files.
• gaia2079471_tour.txt.gz, a list of integers from 1 up to 2,079,471, giving the order the stars appear in the tour,
• gaia2079471_order.txt.gz, the points for gaia2079471 permuted in the tour order,
• gaia2079471.288843524.tour.gz, the tour in TSPLIB format.
To create an example of the TSP, we need to specify precisely the point-to-point distances. For this we adopt the standard TSPLIB norm for 3D Euclidean data. This norm takes the straight-line distance between two points and rounds the resulting value to the nearest integer. In our case, the star-to-star distance is therefore measured to the nearest 1/10th parsec. Here is a simplified version of the computer code used in Concorde for the distance calculation.
The 1/10th parsec accuracy is an overkill when measured against the expected precision of the Astraatmadija and Bailer-Jones placement of stars in 3-space. But it's great for the complexity of the data set. We are, after all, shooting for a 3D TSP challenge, and openly admit over-engineering any physical realization of the galatic tour. (In fact, it would be interesting to increase the precision to the nearest 1/1000th parsec, to see the difference this might make in the solution process.)
A weak point is that our precision is too coarse to distingish indvidual binary stars, which can be much closer than 1/10th parsec. The distance between such pairs in our TSPLIB calculation is therefore zero, so they are effectively a single point. There are 42 such 0-distance pairs, as well as the duplicate copy of our sun. So, to be fair, the real size of the TSP instance is 2,079,428 stars rather than 2,079,471. Still plenty big.
The average length of a star-to-star link in our tour is 13.89 parsecs. The lengths vary from the 43 links having 0 distance, to 6,129 having length less than 1 parsec, and 43 coming in at over 1000 parsecs. The longest star-to-star link has length 3947.1 parsecs.
Stars -- Zoom, pan, and rotate the point set to see its 3D structure. Stars are represented by twinkling points.
Light -- In this version, stars are represented as square particles, resulting in an image that is easier to render (in case you have trouble with the twinkling version).