This article is the 13th day article of Furukawa Lab Advent_calendar. This article was written by a student at Furukawa Lab as part of his studies. The content may be ambiguous or the expression may be slightly different.
Hello everyone Do you know the traveling salesman problem? It's a kind of common combination optimization problem, but in this article I'd like to write about a sad case when I tried to solve the traveling salesman problem properly. To conclude first, I'm afraid of rounding errors. Since it was a story when I was a beginner in the program, I hope you can read it with a low hurdle in everything.
First, I would like to explain the traveling salesman problem. The traveling salesman problem is "the problem of finding the shortest distance to go through all cities once and return to the starting point." For example, given the four cities A, B, C, and D as shown in the following figure, the shortest distance is as follows. In this case, the number of cities is small and the arrangement is easy, so the shortest distance can be found immediately, but as the number of cities increases a little, the shortest distance cannot be seen at a glance.
Actually, the number of combinations of visit order of cities is (number of cities-1) !, so if it is 30 cities or 40 cities, there will be a great number of visit orders, so it is very difficult to check all visit orders. Is difficult, isn't it? Therefore, I tried to solve this combination optimization problem appropriately so that I could get a good solution.
The most orthodox general-purpose method for solving combinatorial optimization problems is the SA method (Simulated-anearing). Translated into Japanese, it is Simulated Annealing. Simulated Annealing sets the energy function $ E $. The energy function in this case is the total distance that goes around all the cities and returns to the starting point. We will change the order of visits to the cities so that this total distance will decrease.
In this case, the total distance will be longer than before the replacement. Therefore, I would like to reject this replacement, but in the SA method, the replacement is adopted even if the distance becomes long with the probability shown in the following formula.
p=\exp \left\{-\frac{1}{T}\left(E_{t+1}-E_{t}\right)\right\}
Here, $ T $ in the formula is a parameter called temperature, which acts as noise. The larger $ T $, the easier it is to adopt replacement. Also, if the distance becomes smaller as a result of replacement, it will be adopted with a probability of 1. This method is called the metropolis method.
To summarize the flow so far
Furthermore, in the SA method, the temperature $ T $ is gradually reduced according to the number of steps by the following formula.
T_{t+1}=\alpha T_{t}
Also, $ \ alpha $ takes the value of $ 0 <\ alpha <1 $ as an indicator of how quickly the temperature drops.
By lowering the temperature in this way, replacement is more likely to occur when the number of steps is small, and replacement is less likely to occur when the number of steps is large, that is, at the end of the game. Then, the first person can get out even if he / she falls into a local solution, and the last person will update so that he / she can decide exactly. This comes from the fact that the strength of steel is increased by starting from a high temperature and gradually cooling it during blacksmithing, which is why it is called Simulated Annealing. This method is very versatile in combinatorial optimization problems.
Well, this is where I wanted to talk. I was trying to solve the traveling salesman problem using Simulated Annealing above. The algorithm itself is simple, so I was able to build it immediately. That's why I tried to test it with a very simple city layout. The easiest city layout is probably circular.
This time, I would like to talk on the premise of this urban layout. By the way, as a review, the procedure of Simulated Annealing is
You can see at a glance that it is not the shortest route. Next, let's decide the city to replace. This is also decided by random numbers, but at that time I was using C language, so by multiplying the uniform distribution that outputs the value of $ 0 <x <1 $ by the number of cities $ N $ (int type), $ 0 I was generating an int type random number of <x <N-1 $. This was the cause of the rebellion.
A simulation was performed using a random number that outputs this self-made integer. At first, the number of steps was about 100, but it works well enough. It's been replaced 600 times. But one day an incident happened. I decided to try 10000 steps (replacement 60,000 times) by chance, and turned the simulation. I got this result. I thought it was strange. I thought something was wrong when the total distance was output, which was less than the theoretical value obtained by manually calculating the coordinates. That's natural, because ** the city has disappeared **. Moreover, this result is not reproducible, and one city disappears, two cities disappear, and nothing disappears. If you are accustomed to programming, you may immediately think that "the result is not reproducible = the random number is bad", but at that time I started out and got stuck in my head. At the end of the story, I even had a delusion like a certain Skynet, saying, "Isn't the algorithm judging that the city should disappear in order to get the shortest distance?"
As some of you may have noticed, the punch line was that 1 itself was output due to rounding error in the part that outputs a uniform random number of $ 0 <x <1 $. If 1 is output, it will be replaced with the Nth city. Since the array of c language is counted from 0, the array is prepared from 0 to N-1, so the Nth array does not exist. As a result of acquiring nothing from there and replacing it, the city disappeared.
From this point onward, it will be a completely different story, but recently I tried the traveling salesman problem again, so I would like to post the results there. This time I tried to solve the traveling salesman problem using SOM. SOM is very simply observation data $ \ mathbf {Y} = (\ mathbf {y} _1, \ mathbf {y} _2, ... \ mathbf {y} _N) \ in \ mathbb {R} ^ {D × N} $ and latent variables
$ \ mathbf {X} = (\ mathbf {x} _1, \ mathbf {x} _2, ... \ mathbf {x} _K) \ in \ mathbb {R} ^ {M × K} $ Mapping $ \ mathbf {f}: \ mathbb {R} ^ {M} → \ mathbb {R} ^ {D} $
For detailed algorithms, there is Document written by a professor in our laboratory, so please refer to that. I'm happy. For example, if you give 3D data and 2D latent space, you will get this result.
The figure on the left shows the observation data and mapping, and the figure on the right shows the latent space.
You can also get this result by giving 2D data and 1D latent space. Click here for an image showing the final result
By the way, the data given is the coordinate data of 48 cities in the United States called att48. It's kind of regrettable. I feel like I can do it. The bottleneck here is that the start and end points are not connected. If you can do that, you're done. Therefore, I will give a latent variable that connects the start point and the end point. That's right, we place latent variables in a circle on the latent space. Here is the source code and execution result of the SOM with the circular latent variables placed.
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
from numpy.random import*
import matplotlib.animation as anm
import math
import random
%matplotlib nbagg
fig = plt.figure(1,figsize=(6, 4))
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)
class som():
def __init__(self,N,K,sigmin,sigmax,r,w,zeta):
self.N = N
self.K = K
self.sigmin = sigmin
self.sigmax = sigmax
self.r = r
self.w = w
self.zeta = zeta
self.hist =[]
self.hist2 = []
self.hist3 = []
self.sigma=0
def fit(self,T):
for i in range(T):
#print(i)
if i==0 :
self.kn=np.random.randint(self.K,size=(self.N))
#if i%10 == 0 :
# self.sigma = self.sigma+0.5
self.sigma=max(self.sigmax-(i/self.r)*(self.sigmax-self.sigmin),self.sigmin)
#print(self.kn.shape)
self.test = (self.zeta[self.kn][None,:,:]-self.zeta[:,None,:])**2
#print("self,test=",self.test.shape)
#print("self,zeta=",self.zeta[self.kn][None,:,:].shape)
#print("self,zeta=",self.zeta[:,None,:].shape)
self.h_kn= np.exp(-1/(2*self.sigma**2)*np.sum((self.zeta[self.kn][None,:,:]-self.zeta[:,None,:])**2,axis=2))
#print(self.h_kn)
self.g_k = np.sum(self.h_kn,axis=1)
self.y_test =(1/(self.g_k[:,None]))*self.h_kn @ self.w
#print("y_test:{}",self.y_test.shape)
self.kn= np.argmin(np.sum((self.w[:,None,:]-self.y_test[None,:,:])**2,axis=2),axis=1)
self.X1=np.reshape(self.y_test,(K,2))
self.hist.append(self.X1)
self.hist2.append(self.kn)
self.hist3.append(self.sigma)
print(self.sigma)
print(np.array(self.hist).shape)
return self.hist,self.hist2,self.hist3
#self.hist.setdefault('y',[]).append(self.y_test)
#self.hist.setdefault('k',[]).append(self.kn)
#def history(self):
#return self.hist
a=0.1
c=-0.1
N=48#The number of data
D=2#Number of dimensions of data
L=1#Number of dimensions of latent space
K=300#Number of nodes
T=100
zeta=np.zeros((K,2))
distance = 0
sigmax=5.0#Initial value of sigma
sigmin=0.01#Minimum value of sigma
latent_space=np.random.normal
r=T-5#Tilt
rad = np.linspace(-np.pi, np.pi, K)
zetax = np.sin(rad)+1
zetay = np.cos(rad)+1
zeta[:,0] = zetax
zeta[:,1] =zetay
train_data = np.loadtxt('att48.txt')
w = np.reshape(train_data,(48,D))
i=0
SOM = som(N,K,sigmin,sigmax,r,w,zeta)
kekka,k,sigma = SOM.fit(T)
kekka = np.array(kekka)
k = np.array(k)
sigma = np.array(sigma)
#print(k.shape)
for i in range(K):
#print(("x1",kekka[T-1,i,0]))
#print(("x2",kekka[T-1,i+1,0]))
#print(("y1",kekka[T-1,i,1]))
#print(("y2",kekka[T-1,i+1,1]))
#print(("x_delta",kekka[T-1,i,0]-kekka[T-1,i+1,0]))
#print(("y_delta",kekka[T-1,i,1]-kekka[T-1,i+1,1]))
if(i==K-1):
distance += np.sqrt(abs(kekka[T-1,i,0]-kekka[T-1,0,0])**2+abs(kekka[T-1,i,1]-kekka[T-1,0,1])**2)
else:
distance += np.sqrt(abs(kekka[T-1,i,0]-kekka[T-1,i+1,0])**2+abs(kekka[T-1,i,1]-kekka[T-1,i+1,1])**2)
#print("delta",np.sqrt(abs(kekka[T-1,i,0]-kekka[T-1,i+1,0])**2+abs(kekka[T-1,i,1]-kekka[T-1,i+1,1])**2))
print(distance)
#print(kekka.shape)
def update(i, zeta,w):
#for i in range(T):
print(i)
ax1.cla()
ax2.cla()
ax1.scatter(w[:,0],w[:,1],s=100,color='r')
ax1.scatter(kekka[i,:,0],kekka[i,:,1],color = 'k',marker = '*')
ax1.plot(kekka[i,:,0],kekka[i,:,1],color='k')
ax1.plot(kekka[i,:,0].T,kekka[i,:,1].T,color='k')
ax2.scatter(zeta[k[i,:]][:,0],zeta[k[i,:]][:,1],c=w[:,0])
ax1.set_xlabel("X_axis")
ax1.set_ylabel("Y_axis")
ax2.set_title('latentspace')
ax2.set_xlabel("X_axis")
ax2.set_ylabel("Y_axis")
ani = anm.FuncAnimation(fig, update, fargs = (zeta,w), interval = 100, frames = T-1,repeat = False)
ani.save("RSom_TSP_Node300.gif", writer = 'imagemagick')
If it is a gif, the final result will disappear in an instant, so I also prepare an image. it's here
It feels really good. The calculation time is fast, so I think it's a pretty good line. After that, I think it will be even better if we can get out of the local solution by adding noise.
Recommended Posts