Introduction

Since this is Qiita's first post, please give us all the details such as not only the content but also the format and writing style.

[Implementing the genetic algorithm (GA) in python to solve the traveling salesman problem (TSP)](http://samuiui.com/2019/10/27/ Implementing the genetic algorithm (ga) in python Patrol /)

I actually tried. I thought that if I got used to coding, I wouldn't have to mention it one by one. I will record what I am interested in as much as possible.

[Here]((http://samuiui.com/2019/10/27/python implements genetic algorithm (ga) and patrols /)) is used as a reference (& quote).

Suddenly

Even if you copy and paste the code of [link above]((http://samuiui.com/2019/10/27/python implements genetic algorithm (ga) and patrols /)) Sometimes it didn't work. That is natural, for example

Things like this

import numpy as np
import random
import matplotlib.pyplot as plt

I have to write. Of course, I shouldn't mention such a rudimentary ... There was a misspelling ... maybe I'm doing it too ...

Let's get into the main subject.

What is a genetic algorithm (GA)?

Prosperous offspring of those with excellent genes. In the genetic algorithm, a good solution is prospered in the same way.

Descendants prosperity

A dominant individual for every element gives birth to many offspring, Inferior individuals can give birth to only a few offspring. The same thing happens with the movement from child generation to grandchild generation. By repeating this process, sophisticated children who are adapted to the environment are born.

mutation

As the crossing continues, only similar individuals will be found. Our genes ensure diversity by causing occasional mutations.

What is the Traveling Salesman Problem (TSP)?

The traveling salesman problem is It's a kind of problem that a salesman should go around n business destinations in what order. Theoretically, there are n! Street routes, so brute force is difficult.

This time I would like to solve GA using this TSP.

flow

[Source]((http://samuiui.com/2019/10/27/Implementing a genetic algorithm (ga) in python and patrols /))

Initial individual generation

As an early individual, you need to create a city and a gene.

city

Creates n cities with both x and y coordinates in the range 0-1. Input is the number of cities, output is the matrix of (number of cities) x 2 Implementation example: generate_rand_cities

#Generate a city
def generate_rand_cities(num_cities):
    positions = np.zeros((num_cities, 2))
    for i in range(num_cities):
        positions[i, 0] = random.random()
        positions[i, 1] = random.random()
    return positions
#As a test
generate_rand_cities(10)

About random (random.random)

random.random()

Randomly generates floating point numbers between 0.0 and 1.0

gene

Give the gene information about the order in which it travels through cities. The length of the gene matches the number of cities. For example, when the number of cities is 6, [3, 5, 1, 0, 4, 2] [0, 4, 2, 1, 5, 3] Such individuals are possible.

Because genes make multiple individuals in one generation Genes are represented by the matrix (number of genes) x (number of cities).

The input is the number of cities and the number of individuals in one generation, and the output is the matrix of (number of genes) x (number of cities).

Implementation example: generate_init_genes

#Gene generation
def generate_init_genes(num_individual, num_cities):
    genes = np.zeros((num_individual, num_cities), dtype=np.int16)
    for i in range(num_individual):
        genes[i,] = random.sample(range(num_cities), k=num_cities)
    return genes
#As a test
generate_init_genes(15,10)

About random (random.sample)

Random.sample is taken out randomly without duplication. For example

l = [0, 1, 2, 3, 4]
random.sample(l, 3)

Will output something like [4, 2, 1] or [1, 0, 3].

Evaluation

In this task, genes showing shorter pathways need to be highly evaluated. Therefore, first, the length of the pathway corresponding to the gene is obtained. After that, it is compared and evaluated within the generation.

Pathways corresponding to genes

The reference source calculated the distance between cities for each gene. Make a table about the distance between cities and refer to it according to the order of genes.

Distance table between cities

Implementation example

#Table about distances between cities
def generate_dist_table(positions):
    path_table = np.zeros((len(positions), len(positions)))
    for i in range(len(positions)):
        for j in range(len(positions)):
            path_table[i,j] = np.linalg.norm(positions[i]-positions[j])
    return path_table

#As a test
num_cities = 4
positions = generate_rand_cities(4)
generate_dist_table(positions)

The distance between two cities is calculated twice. I can still scrape here ...

The length of the pathway corresponding to the gene

Based on this table, the length of the pathway corresponding to the gene is obtained.


#Refer to the table to find the sum of the pathways corresponding to the genes.
def sum_path2(path_table, gene):
    sum = 0.
    for i in range(len(gene)-1):
        sum += path_table[int(gene[i]),int(gene[i+1])]
    return sum

#As a test
cities = generate_rand_cities(10)
genes = generate_init_genes(15,10)
dist_table = generate_dist_table(cities)
for i in range(len(genes)):
    print(sum_path2(dist_table, genes[i]))

In order to evaluate genes collectively for each generation, set a function that outputs genes for one generation collectively.

#One-generation calculation for the sum of routes
def genes_path2(path_table, genes):
    pathlength_vec = np.zeros(len(genes))
    for i in range(len(genes)):
        pathlength_vec[i] = sum_path2(path_table, genes[i])
    return pathlength_vec

#As a test
cities = generate_rand_cities(10)
genes = generate_init_genes(15,10)
dist_table = generate_dist_table(cities)
genes_path2(dist_table, genes)

roulette

The length of the pathway corresponding to the gene was found. This time, the shorter the route, the higher the evaluation. The shorter the route, the higher the probability of being selected as the one that leaves offspring. Perform normalization.

#roulette
def generate_roulette(fitness_vec):
    total = np.sum(fitness_vec)
    roulette = np.zeros(len(fitness_vec))
    for i in range(len(fitness_vec)):
        roulette[i] = fitness_vec[i]/total
    return roulette

#As a test
fitness = np.array([20,50,30])
generate_roulette(fitness)

On the contrary

array([0.2, 0.5, 0.3])

Will be.

Evaluate genes

The length of the pathway corresponding to the gene is already known. This time, the shorter the route, the higher the evaluation. The shorter the route, the higher the probability of being selected as the one that leaves offspring. Therefore, this time, the reciprocal of the path length is used.

#Roulette execution example
cities = generate_rand_cities(20)
genes = generate_init_genes(21, 20)
dist_table = generate_dist_table(cities)
paths = genes_path2(dist_table, genes)
inverse_path = 1/paths
print("path length: "+str(paths))
print("roulette table: "+str(generate_roulette(inverse_path)))

Since the total of the roulette table is 1, it can be interpreted as the probability that the corresponding gene will be selected. This is used to select genes.

Selection (selection)

#Select individuals to cross based on roulette
def roulette_choice(fitness_vec):
    roulette = generate_roulette(fitness_vec)
    choiced = np.random.choice(len(roulette), 2, replace=True, p=roulette)
    return choiced

#As a test
cities = generate_rand_cities(20)
genes = generate_init_genes(21, 20)
dist_table = generate_dist_table(cities)
fitness_vec = 1 / genes_path2(dist_table, genes)
roulette_choice(fitness_vec)

Create a roulette table with generate_roulette. Select two genes from the roulette based on the roulette table.

About random (random.choice)

random.choice has duplicates and randomly selects a value. The range of numbers selected by the first argument, The second argument is the number to choose, Specify whether there is a duplicate with replace (True with duplicate) The probability that each is selected is specified by p.

Crossing

It is now possible to select two relatively excellent individuals using roulette_choice. Multiply the two individuals. There are multiple crossover methods depending on the shape of the gene and also for the same gene shape. For example, there seems to be order crossing and circular crossing → Application of GA to order problems This time we performed a partial crossover. Please refer to the image below for the method of partial crossing. [Source]((http://samuiui.com/2019/10/27/Implementing a genetic algorithm (ga) in python and patrols /))

Implementation of partial crossover

#Implementation of partial crossover
def partial_crossover(parent1, parent2):
    num = len(parent1)
    cross_point = random.randrange(1,num-1)   #Break
    child1 = parent1
    child2 = parent2
    for i in range(num - cross_point):
        #Value just to the right of the break
        target_index = cross_point + i
        target_value1 = parent1[target_index]
        target_value2 = parent2[target_index]
        exchange_index1 = np.where(parent1 == target_value2)
        exchange_index2 = np.where(parent2 == target_value1)
        #Exchange
        child1[target_index] = target_value2
        child2[target_index] = target_value1
        child1[exchange_index1] = target_value1
        child2[exchange_index2] = target_value2
    return child1, child2

#As a test
genes = generate_init_genes(2, 10)
print("parent1: "+str(genes[0]))
print("parent2: "+str(genes[1]))
child = partial_crossover(genes[0], genes[1])
print("child1:  "+str(child[0]))
print("child2:  "+str(child[1]))

mutation

There are various methods for mutation. This time we will do what is called a translocation. Swap two randomly selected places.

def translocation_mutation2(genes, p_value):
    mutated_genes = genes
    for i in range(len(genes)):
        mutation_flg = np.random.choice(2, 1, p = [1-p_value, p_value])
        if mutation_flg == 1:
            mutation_value = np.random.choice(genes[i], 2, replace = False)
            mutation_position1 = np.where(genes[i] == mutation_value[0])
            mutation_position2 = np.where(genes[i] == mutation_value[1])
            mutated_genes[i][mutation_position1] = mutation_value[1]
            mutated_genes[i][mutation_position2] = mutation_value[0]
    return mutated_genes

#As a test
genes = generate_init_genes(5, 10)
print("before")
print(genes)
print("after")
print(translocation_mutation2(genes, 0.7))

mutation_flag has a probability of p_value of 1. Therefore, the gene mutates with a probability of p_value.

Visualization

Although it is not written in the flowchart, it is visualized using matplotlib to check the transition of GA.

City location

#Visualization of city location
def show_cities(cities):
    for i in range(len(cities)):
        plt.scatter(cities[i][0], cities[i][1])

#As a test
cities = generate_rand_cities(10)
show_cities(cities)

Route visualization

def show_route(cities, gene):
    for i in range(len(gene)-1):
        if i == 0:
            plt.text(cities[int(gene[i])][0], cities[int(gene[i])][1], "start")
        else:
            plt.text(cities[int(gene[i])][0], cities[int(gene[i])][1], str(i))
        plt.plot([cities[int(gene[i])][0], cities[int(gene[i+1])][0]], 
                 [cities[int(gene[i])][1], cities[int(gene[i+1])][1]])
    plt.text(cities[int(gene[i+1])][0], cities[int(gene[i+1])][1], "goal")

#As a test
cities = generate_rand_cities(10)
show_cities(cities)
genes = generate_init_genes(10,10)
show_route(cities,genes[0])

Integration

Implement a program that solves TSP with GA by using the functions created so far.

Parameters

#Parameters
num_cities = 10
individuals = 21
generation = 10000
elite = 9
p_mutation = 0.01

Initialize

#Initialize
cities = generate_rand_cities(num_cities)
genes = generate_init_genes(individuals, num_cities)
dist_table = generate_dist_table(cities)
show_cities(cities)
show_route(cities, genes[0])
plt.show()

You should be able to see this

Execution department

With excellent elite of the parent generation Individual-elite children born by crossing between parent generations are responsible for the child generation. Mutations occur in the offspring to form the final offspring.

#Execution department
top_individual=[]  #List of best fitness for each generation
max_fit = 0  #Highest fitness ever
fitness_vec = np.reciprocal(genes_path2(dist_table, genes))
for i in range(generation):
    children = np.zeros(np.shape(genes))
    for j in range(int((individuals-elite)/2+1)):
        parents_indices = roulette_choice(fitness_vec)
        children[2*j], children[2*j+1] = partial_crossover(genes[parents_indices[0]], genes[parents_indices[1]])

    for j in range(individuals-elite, individuals):
        children[j] = genes[np.argsort(genes_path2(dist_table, genes))[j-individuals+elite]]

    children = translocation_mutation2(children, p_mutation)
    top_individual.append(max(fitness_vec))
    genes = children
    fitness_vec = np.reciprocal(genes_path2(dist_table, genes))
    if max(fitness_vec) > max_fit:  #After recording the highest fitness ever, display the route
        max_fit = max(fitness_vec)
        show_cities(cities)
        show_route(cities, genes[np.argmax(fitness_vec)])
        plt.text(0.05, 0.0, "generation: " + str(i) + "  distance: " +str(sum_path2(dist_table,genes[np.argmax(fitness_vec)])))
        plt.show()

top_individual is a list of the best fitness of each generation.

The last one displayed It will be the one that goes around in the order that the route becomes shorter like. Also,

plt.plot(top_individual)

You can check the transition of fitness as the generation changes.

About numpy (memo)

np.reciprocal () → Gives the reciprocal of each element np.argmax () → Gives the index of the largest of each element np.argsort () → Index order when each element is sorted

Summary

As a default setting Number of cities, Gene population, Number of evolutionary generations, Number of elite, Mutation probability, Was given.

Obviously there is a better route if you increase the number of cities, but it doesn't evolve into that route. Also, it takes time to calculate There is still room for improvement. As a possible point

How to cross

I don't know anything because I don't know other ways of crossing. There may be a better crossing method. Also, I want to connect it with mutation.

Mutation probability

It is thought that efficiency will be improved if the probability of mutation changes depending on the similarity of parents.

Number of elites picked up

This time I decided on the initial settings. However, as with the probability of mutation, it may be better to change this according to the similarity of the parents. As a test, if the number of elites is reduced, the fitness does not increase and it vibrates. It is thought that fitness will not increase in the early stages because bad elements will be picked up by crossing.

Number of generations

The number of generations was also given by default, but there is room for improvement. The number of generations is directly related to the burden of calculation. If the end condition is set from the rate of change in fitness, it will not be necessary to specify the number of generations.

Gene population

Similarly, the number of gene individuals is involved in the calculation. If the number of individuals is small, it will be necessary to increase the generation of evolution. If you set it in a suitable ratio for the number of cities given, you will not need to enter it.

Outlook

In the end, just enter the city and it will guide you to the right route. Therefore, it seems necessary to investigate the crossover method, how to give the probability of mutation, and suitable values for the number of individuals and generations of genes. I will do my best.

random summary

-Randomly select one element: random.choice () -Randomly select multiple elements (no duplication): random.sample () -Randomly select multiple elements (with duplicates): random.choices () -Randomly generate floating point numbers from 0 to 1: random.random () From Using random with python

I tried "Implementing a genetic algorithm (GA) in python to solve the traveling salesman problem (TSP)"

Introduction

[Implementing the genetic algorithm (GA) in python to solve the traveling salesman problem (TSP)](http://samuiui.com/2019/10/27/ Implementing the genetic algorithm (ga) in python Patrol /)

Suddenly

What is a genetic algorithm (GA)?

Descendants prosperity

mutation

What is the Traveling Salesman Problem (TSP)?

flow

Initial individual generation

city

About random (random.random)

gene

About random (random.sample)

Evaluation

Pathways corresponding to genes

Distance table between cities

The length of the pathway corresponding to the gene

roulette

Evaluate genes

Selection (selection)

About random (random.choice)

Crossing

Implementation of partial crossover

mutation

Visualization

City location

Route visualization

Integration

Parameters

Initialize

Execution department

About numpy (memo)

Summary

How to cross

Mutation probability

Number of elites picked up

Number of generations

Gene population

Outlook

random summary