Prerequisites:
1. Knowing what the Traveling Salesman Problem (TSP) is. The wiki article is a good place to start. The symmetrical form of the problem is where the distance from one city to another is the same in both directions.
2. A decent understanding of what Kohonen/Self-Organizing Maps are. This blog post will point you in the right direction to learn more.
After I wrote my former blog post on implementing SOMs with TensorFlow, someone on Reddit mentioned that there are ways to solve TSPs using them. When I did some research, it appeared that there have been a few attempts to do so. Heres one, and heres another. So in my spare time, I decided to try and have a crack at it myself.
I will describe my approach (along with the code) in a short while, but let me give a disclaimer first. This post in no way claims to have found the ‘best’ way to solve TSPs (ACOs are pretty much the state of the art). It just describes my attempt, and the results I obtained (which were interesting, with respect to the methodology implemented).
I obviously make a few assumptions, and here they are:
1. Every city has a corresponding vector available, denoting its coordinates/location in a 2D/3D space. This source presents many TSP benchmark data-sets in this format. I also tried assigning every city a vector where every dimension is its distance from all the cities(including itself). This gives pretty sub-par results, so I decided not to pursue that. Maybe you could use a dimensionality reduction technique like PCA on these ‘distance’ vectors and get good results.
2. Because of the above point, my method is pretty much restricted to solving the symmetric form of a TSP (atleast the way I have done it).
The Method
Suppose the total number of neurons is . The total number of cities in question is
. After some trial and error, I set
. It can be higher/lower as well. The lower the number, the less optimal is the understanding of the solution space. And as you go higher, after a certain point, there seems to be no improvement in the output (but your execution time increases many-fold).
The individual neurons are provided locations around a circle, as follows:
where is the location vector for the
th neuron in a 2-D space.
goes from
to
. It is important to not confuse this space with the space in which the cities lie (even though both are 2-dimensional). Lets call one the Neuron-Space, and the other, the City-Space.
Because of the circular nature of this SOM, we inherently have a closed loop-structure to our framework (as is needed by the TSP solution), where neurons in proximity of each other in the Neuron-Space lie near-by in the City-Space. Also, do note how every neuron essentially corresponds to an angle ( ) around the circle. This point would be useful soon.
The initialization of the neuron vectors in City-Space is done by using the mean and standard deviation of the dimensions of the city vectors in City-Space. This is because I observed that the scale of all the dimensions in the city-vectors is pretty much the same. If it isn’t, you might have to resort to building a mean vector and covariance matrix. For training, I use and
as parameters for the SOM. Putting a constant value of
ensures that the same proportion of neurons around the Best-Matching Unit get affected everytime, irrespective of the total number of neurons being used.
Once the SOM trained, we can say we have the rough ‘skeleton’ for a good route in the City-Space. But how do you convert this skeleton to the appropriate route? By using those angles I just talked about. Consider a city . Lets say the
nearest neurons to it, are
.
So, we can construct a vector for every city in Neuron-Space, using this simple formula:
where is the distance between city
and neuron
in City-Space. Using this vector and some pretty simple (inverse) trigonometry, you can assign the city an angle around the circle on which the SOM-neurons themselves lie in Neuron-Space. Now all you need to do to come up with an approximately good solution, is just sort the cities with respect to the angles they are assigned. Voila!
Ofcourse, the solution that this algorithm gives you would most likely be in the neighbourhood of a slightly better one, so I use a standard TSP local search heuristic like 3-opt over the solution obtained from my algorithm. What 3-opt essentially does, is change around the solution obtained from some other algorithm over so slightly in some pre-defined ways, in the hope of hitting on a better solution. 3-opt is better than 2-opt for this algorithm. I dint try higher-order k-opt searches.
One interesting possibility about this algorithm, is the fact that it could provide great solutions to solving TSPs for different target cities, without retraining the SOM– as long as they lie in the same City-Space. As long as the set of cities you try to solve for follows (atleast approximately) the trends shown by the dataset used for training, all you would need to do is compute their respective angles (which does not involve re-training), and just apply 3-opt. In short, you could use the same SOM, once trained, to solve for multiple subsets of cities.
The Implementation
This code builds on my Tensorflow-based SOM code. Its pretty rough around the edges, especially the 3-opt search. It was meant just as a proof-of-concept, so feel free to hack around if you like.
import tensorflow as tf import numpy as np def _distance(tsp_solution, distance_matrix): """ Total distance for a solution. """ total_distance = 0 for i in range(len(tsp_solution)-1): total_distance += (distance_matrix[tsp_solution[i]][ tsp_solution[i + 1]]) total_distance += (distance_matrix[tsp_solution[0]][ tsp_solution[-1]]) return total_distance def _3_opt(route, distance_matrix): """ Uses 3-opt to find a better solution in the neighbourhood. """ current_dist = _distance(route, distance_matrix) #Find three connections to break for i in range(len(route) - 1): for j in range(i + 2, len(route) - 1): for k in range(j + 2, len(route) - 1): #Now reconnect the graph in every other way possible, #and see if any gives a better solution #Way 1 temp = route[:] temp[j+1:k+1] = reversed(route[j+1:k+1]) dist = _distance(temp, distance_matrix) if dist < current_dist: route = temp current_dist = dist #Way 2 temp = route[:] temp[i+1:j+1] = reversed(route[i+1:j+1]) dist = _distance(temp, distance_matrix) if dist < current_dist: route = temp current_dist = dist #Way 3 temp = route[:] temp[i+1:j+1], temp[j+1:k+1] = ( reversed(route[i+1:j+1]), reversed(route[j+1:k+1])) dist = _distance(temp, distance_matrix) if dist < current_dist: route = temp current_dist = dist #Way 4 temp = (route[:i+1] + route[j+1:k+1] + route[i+1:j+1] + route[k+1:]) dist = _distance(temp, distance_matrix) if dist < current_dist: route = temp current_dist = dist #Way 5 temp = route[:i+1] + route[j+1:k+1] temp += reversed(route[i+1:j+1]) temp += route[k+1:] dist = _distance(temp, distance_matrix) if dist < current_dist: route = temp current_dist = dist #Way 6 temp = route[:i+1] temp += reversed(route[j+1:k+1]) temp += route[i+1:j+1] temp += route[k+1:] dist = _distance(temp, distance_matrix) if dist < current_dist: route = temp current_dist = dist #Way 7 temp = route[:i+1] temp += reversed(route[j+1:k+1]) temp += reversed(route[i+1:j+1]) temp += route[k+1:] dist = _distance(temp, distance_matrix) if dist < current_dist: route = temp current_dist = dist return route class TSPSolver(object): """ Uses a Circular Self-Organizing Map with a Gaussian Neighbourhood function and linearly decreasing learning rate, to solve TSP. """ #To check if the SOM has been trained _trained = False def __init__(self, input_vects, n_iterations=200): """ Initializes all necessary components of the TensorFlow Graph. 'input_vects' should be a list of 1-D NumPy vectors denoting location vectors for the cities. 'n_iterations' is an integer denoting the number of iterations that would be run while training the SOM. """ #Assign required variables first self._n = len(input_vects) dim = len(input_vects[0]) self._n_centroids = self._n*3 alpha = 0.3 sigma = 0.3 self._n_iterations = abs(int(n_iterations)) #For initialization of the centroid(neuron) vectors init_values = [] for vect in input_vects: init_values.extend(list(vect)) init_mean = np.mean(init_values) init_dev = np.sqrt(np.var(init_values)) ##INITIALIZE GRAPH self._graph = tf.Graph() ##POPULATE GRAPH WITH NECESSARY COMPONENTS with self._graph.as_default(): ##VARIABLES AND CONSTANT OPS FOR DATA STORAGE #Randomly initialized weightage vectors for all neurons, #stored together as a matrix Variable of size [self._n, dim] self._weightage_vects = tf.Variable(tf.random_normal( [self._n_centroids, dim], init_mean, init_dev)) #Matrix of size [self._n, 2] for SOM grid locations #of neurons self._location_vects = tf.constant(np.array( list(self._neuron_locations(self._n_centroids)))) ##PLACEHOLDERS FOR TRAINING INPUTS #We need to assign them as attributes to self, since they #will be fed in during training #The training vector self._vect_input = tf.placeholder("float", [dim]) #Iteration number self._iter_input = tf.placeholder("float") ##CONSTRUCT TRAINING OP PIECE BY PIECE #Only the final, 'root' training op needs to be assigned as #an attribute to self, since all the rest will be executed #automatically during training #To compute the Best Matching Unit given a vector #Basically calculates the Euclidean distance between every #neuron's weightage vector and the input, and returns the #index of the neuron which gives the least value bmu_index = tf.argmin(tf.sqrt(tf.reduce_sum( tf.pow(tf.sub(self._weightage_vects, tf.pack( [self._vect_input for i in range( self._n_centroids)])), 2), 1)), 0) #This will extract the location of the BMU based on the BMU's #index slice_input = tf.pad(tf.reshape(bmu_index, [1]), np.array([[0, 1]])) bmu_loc = tf.reshape(tf.slice(self._location_vects, slice_input, tf.constant(np.array([1, 2]))), [2]) #To compute the alpha and sigma values based on iteration #number learning_rate_op = tf.sub(1.0, tf.div(self._iter_input, self._n_iterations)) _alpha_op = tf.mul(alpha, learning_rate_op) _sigma_op = tf.mul(sigma, learning_rate_op) #Construct the op that will generate a vector with learning #rates for all neurons, based on iteration number and location #wrt BMU. bmu_distance_squares = tf.reduce_sum(tf.pow(tf.sub( self._location_vects, tf.pack( [bmu_loc for i in range(self._n_centroids)])), 2), 1) neighbourhood_func = tf.exp(tf.neg(tf.div(tf.cast( bmu_distance_squares, "float32"), tf.pow(_sigma_op, 2)))) learning_rate_op = tf.mul(_alpha_op, neighbourhood_func) #Finally, the op that will use learning_rate_op to update #the weightage vectors of all neurons based on a particular #input learning_rate_multiplier = tf.pack([tf.tile(tf.slice( learning_rate_op, np.array([i]), np.array([1])), [dim]) for i in range( self._n_centroids)]) weightage_delta = tf.mul( learning_rate_multiplier, tf.sub(tf.pack([self._vect_input for i in range( self._n_centroids)]), self._weightage_vects)) new_weightages_op = tf.add(self._weightage_vects, weightage_delta) self._training_op = tf.assign(self._weightage_vects, new_weightages_op) ##INITIALIZE SESSION self._sess = tf.Session() ##INITIALIZE VARIABLES init_op = tf.initialize_all_variables() self._sess.run(init_op) #Train the Solver self.train(input_vects) @property def solution(self): """ Returns the TSP solution order as a list, and the total Euclidean distance with respect to the input vectors. """ if not self._trained: raise ValueError("SOM not trained yet") return self._sol, self._sol_dist @classmethod def _neuron_locations(cls, n_centroids): """ Yields one by one the 2-D locations of the individual neurons in the SOM. """ for i in range(n_centroids): yield np.array([np.cos(i*2*np.pi/float(n_centroids)), np.sin(i*2*np.pi/float(n_centroids))]) def train(self, input_vects): """ Trains the SOM. 'input_vects' should be a list of 1-D NumPy arrays with dimensionality as provided during initialization of this SOM. The points in the solution are enumerated as they are ordered in 'input_vects'. Current weightage vectors for all neurons(initially random) are taken as starting conditions for training. """ #Training iterations for iter_no in range(self._n_iterations): #Train with each vector one by one for input_vect in input_vects: self._sess.run(self._training_op, feed_dict={self._vect_input: input_vect, self._iter_input: iter_no}) self._input_vects = input_vects self._compute_solution() self._trained = True def _compute_solution(self): """ Computes the solution to the TSP with respect to the input vectors. """ #TODO: All nearest-neighbour searches could be speeded up #significantly with a KD-Tree/Ball Tree. #Get the centroid data centroid_vects = list(self._sess.run(self._weightage_vects)) centroid_locations = list(self._sess.run(self._location_vects)) #Distance matrix mapping input point number to list of distances #from centroids distances = {} for point in range(self._n): distances[point] = [] for centroid in range(self._n_centroids): distances[point].append( np.linalg.norm(centroid_vects[centroid] - self._input_vects[point])) #Distance matrix mapping input point number to list of distances #from other input points point_distances = {} for point in range(self._n): point_distances[point] = [] for other_point in range(self._n): point_distances[point].append( np.linalg.norm(self._input_vects[other_point] - self._input_vects[point])) #Compute angle with respect to each city(point) point_angles = {} for point in range(self._n): total_vect = 0 cents = [j for j in range(self._n_centroids)] cents.sort(key=lambda x: distances[point][x]) for centroid in cents[:2]: total_vect += (1.0/(distances[point][centroid]) * centroid_locations[centroid]) total_vect = total_vect/np.linalg.norm(total_vect) if total_vect[0] > 0 and total_vect[1] > 0: angle = np.arcsin(total_vect[1]) elif total_vect[0] < 0 and total_vect[1] > 0: angle = np.arccos(total_vect[0]) elif total_vect[0] < 0 and total_vect[1] < 0: angle = np.pi - np.arcsin(total_vect[1]) else: angle = 2*np.pi - np.arccos(total_vect[0]) point_angles[point] = angle #Find the rough solution tsp_solution = [i for i in range(self._n)] tsp_solution.sort(key=lambda x: point_angles[x]) tsp_solution = _3_opt(tsp_solution, point_distances) #Compute the total distance for the solution total_distance = _distance(tsp_solution, point_distances) self._sol = tsp_solution self._sol_dist = total_distance
Results
Each of these problem datasets can be found at the source here. All the presented results are over 10 runs. The max runtime was experienced with krA100, around one minute.
Berlin-52: Best Error: 0.0%, Average Error: 4.1%.
krA100: Best Error: 1.6%, Average Error: 2.7%.
eli76: Best Error: 2%, Average Error: 4.4%.
These results aren’t ‘close’ to those provided by ACOs, but they are practically feasible, and show a good promise (Oh now I just sound like I am writing a research paper.) Plus, the algorithm isn’t a time-hogger either (judging by runtimes on my laptop).
Anyways. Thanks for reading! Do drop a comment if you think of an improvement over this work.
Great article! Think about implementing the Lin-Kerninghan heuristic instead of a higher k-opt.
Great work! Thank you so much for this.
Love this! Laughed when you joked about the research paper part 🙂 In reality, I think this is a great model since it is simply/intuitive, you can fit new nodes to your map easily + you can very efficiently split the circle into meaningful clusters like quarter circles.
Thanks! That was the original idea, to get a sort of ‘skeleton’ route that can be changed if you wanna visit _most_ cities, but not all of them. It won’t be the optimal solution, but a good one nonetheless.