Feature selection by genetic algorithm

Introduction

Is there a genetic algorithm? 011101010 ... Can this be used for feature selection? When I thought about it, there was a person who was doing it as a matter of course. I reorganized it in my own way with reference to the code of the pioneer. It is the nth decoction.

reference

--Selection of features using a genetic algorithm (https://horomary.hatenablog.com/entry/2019/03/10/190919) --Slides and implementation of the One Max problem that were helpful when getting started with genetic algorithms (https://tech.mof-mof.co.jp/blog/ga-one-max-problem.html)

procedure

Implementation of genetic algorithm

For details on the genetic algorithm, refer to here. This article uses the OneMax problem as an example. The OneMax problem is a problem that sets all the elements of the array [0,1,0,1,1,1,0, ...] given as initial values to 1, and try to solve this with a genetic algorithm. I will.

For the code, I referred to this github. The names of terms and variables that appear in the genetic algorithm are here /blog/ga-one-max-problem.html) was referred to. To summarize briefly, a scalar value that takes 0 or 1 is a gene, and a sequence [0,0,1,1,0,1, ...] that is a collection of genes is a chromosome. Body with? Was set as an individual. By design, the individual has a chromosome as an attribute. A set of individuals is called a population. The algorithm is elite selection, not roulette selection.

The elements required for a genetic algorithm are:

"""
Written by Azunyan https://github.com/Azunyan1111/OneMax
"""
"""
Modified Copyright 2020 ground0state All Rights Reserved.
"""
import random
import time


class Individual():
    """individual.
    
    Parameters
    ----------
    chromosome : list of {0 or 1}
Chromosomes.
        
    evaluation : float
Evaluation.
    """
    chromosome = None
    evaluation = None

    def __init__(self, chromosome, evaluation):
        self.chromosome = chromosome
        self.evaluation = evaluation


def create_individual(length):
    """Returns the individual that generated and stored the random chromosome of the digit specified by the argument.
    
    Parameters
    ----------
    length : int
Chromosome length.

    Returns
    -------
    individual : Individual
individual.
    """
    individual = Individual([random.randint(0, 1) for i in range(length)], 0)
    return individual


def evaluate_individual(individual):
    """Evaluation function.
    
    Parameters
    ----------
    individual : Individual
individual.
        
    Returns
    -------
    eval : float
Evaluation value.
    """
    eval = sum(individual.chromosome)/len(individual.chromosome)
    return eval


def extract_elites(population, num):
    """Choice function.
    
    Parameters
    ----------
    population : list of Individual
Group.
    num : int
Number of individual selections.
        
    Returns
    -------
    elites : list of Individual
Group that has undergone selection processing.
    """        
    #Sort the ratings of the current generation population in descending order
    sort_result = sorted(population, reverse=True, key=lambda individual: individual.evaluation)
    
    #Extract certain tops
    elites = sort_result[:num]
    return elites


def crossover(individual1, individual2, chromosome_length):
    """Cross function.
Perform two-point crossing.
    
    Parameters
    ----------
    individual1 : Individual
Crossing individuals 1.
    individual2 : Individual
Crossing individuals 2.
    chromosome_length : int
Chromosome length.
        
    Returns
    -------
    offsprings : list of Individual
Two grandchildren.
    """
    
    #Set two points to be replaced
    cross_one = random.randint(0, chromosome_length)
    cross_second = random.randint(cross_one, chromosome_length)
    
    #Extract the gene
    one = individual1.chromosome
    second = individual2.chromosome
    
    #Cross
    progeny_one = one[:cross_one] + second[cross_one:cross_second] + one[cross_second:]
    progeny_second = second[:cross_one] + one[cross_one:cross_second] + second[cross_second:]
    
    #descendants
    offsprings = [Individual(progeny_one, 0), Individual(progeny_second, 0)]
    return offsprings


def create_next_generation(population, elites, offsprings):
    """Performs generation change processing.
    
    Parameters
    ----------
    population : list of Individual
Current generation population.
    elites : list of Individual
Current generation elite group.
    offsprings : list of Individual
Current generation offspring group.
        
    Returns
    -------
    next_generation_population : list of Individual
Next-generation population.
    """
    #Sort the ratings of the current generation population in ascending order
    next_generation_population = sorted(population, reverse=False, key=lambda individual: individual.evaluation)
    
    #Remove the sum of the elite and offspring populations you add
    next_generation_population = next_generation_population[len(elites)+len(offsprings):]
        
    #Add elite and offspring groups to the next generation
    next_generation_population.extend(elites)
    next_generation_population.extend(offsprings)
    return next_generation_population


def mutation(population, induvidual_mutation_probability, gene_mutation_probability):
    """Mutation function.
    
    Parameters
    ----------
    population : list of Individual
Group.
    induvidual_mutation_probability : float in [0, 1]
Individual mutation probability.
    gene_mutation_probability : float in [0, 1]
Gene mutation probability.
        
    Returns
    -------
    new_population : list of Individual
Mutated population.
    """
    new_population = []
    for individual in population:
        #Mutation occurs with a certain probability for an individual
        if induvidual_mutation_probability > random.random():
            new_chromosome = []
            for gene in individual.chromosome:
                #Mutations occur in each individual genetic information
                if gene_mutation_probability > random.random():
                    new_chromosome.append(random.randint(0, 1))
                else:
                    new_chromosome.append(gene)
                    
            individual.chromosome = new_chromosome
            new_population.append(individual)
        else:
            new_population.append(individual)
            
    return new_population

Use these classes and functions to run in the following code.

#Chromosome length
CHROMOSOME_LENGTH = 13

#The size of the population
POPULATION_SIZE = 30

#Elite chromosome selection number
PICK_OUT_SIZE = 5

#Individual mutation probability
INDIVIDUAL_MUTATION_PROBABILITY = 0.3

#Gene mutation probability
GENE_MUTATION_PROBABILITY = 0.1

#Number of generations to repeat
ITERATION = 10


#Initialize the population of the current generation
current_generation_population = [create_individual(CHROMOSOME_LENGTH) for i in range(POPULATION_SIZE)]

for count in range(ITERATION):
    #Start time of each loop
    start = time.time()
    
    #Evaluate individuals in the current generation population
    for individual in current_generation_population:
        individual.evaluation = evaluate_individual(individual)

    #Select an elite individual
    elites = extract_elites(current_generation_population, PICK_OUT_SIZE)
    
    #Cross the elite genes and store them in the list
    offsprings = []
    for i in range(0, PICK_OUT_SIZE-1):
        offsprings.extend(crossover(elites[i], elites[i+1], CHROMOSOME_LENGTH))
        
    #Create next-generation populations from current generations, elite populations, and offspring populations
    next_generation_population = create_next_generation(current_generation_population, elites, offsprings)
    
    #Mutate all individuals in the next-generation population.
    next_generation_population = mutation(next_generation_population,
                                          INDIVIDUAL_MUTATION_PROBABILITY,
                                          GENE_MUTATION_PROBABILITY)

    #Evolutionary computation of one generation is completed. Move on to evaluation

    #Arrange the evaluation values of each individual.
    fits = [individual.evaluation for individual in current_generation_population]

    #Evaluate evolutionary results
    min_val = min(fits)
    max_val = max(fits)
    avg_val = sum(fits) / len(fits)

    #Outputs the evolution results of the current generation
    print("-----No.{}Generational results-----".format(count+1))
    print("  Min:{}".format(min_val))
    print("  Max:{}".format(max_val))
    print("  Avg:{}".format(avg_val))

    #Swap the current generation with the next generation
    current_generation_population = next_generation_population
    
    #measurement of time
    elapsed_time = time.time() - start
    print ("  {}/{} elapsed_time:{:.2f}".format(count+1, ITERATION, elapsed_time) + "[sec]")

#Final result output
print("")  #new line
print("The best individual is{}".format(elites[0].chromosome))

The output looks like this:

-----1st generation results-----
  Min:0.23076923076923078
  Max:0.8461538461538461
  Avg:0.5384615384615384
  1/10 elapsed_time:0.00[sec]
-----Second generation results-----
  Min:0.46153846153846156
  Max:0.8461538461538461
  Avg:0.6692307692307694
  2/10 elapsed_time:0.00[sec]
-----3rd generation results-----
  Min:0.6923076923076923
  Max:0.9230769230769231
  Avg:0.761538461538462
  3/10 elapsed_time:0.00[sec]
-----4th generation results-----
  Min:0.6923076923076923
  Max:0.9230769230769231
  Avg:0.8102564102564106
  4/10 elapsed_time:0.00[sec]
-----5th generation results-----
  Min:0.6923076923076923
  Max:0.9230769230769231
  Avg:0.8512820512820515
  5/10 elapsed_time:0.00[sec]
-----6th generation results-----
  Min:0.7692307692307693
  Max:0.9230769230769231
  Avg:0.848717948717949
  6/10 elapsed_time:0.00[sec]
-----7th generation results-----
  Min:0.7692307692307693
  Max:0.9230769230769231
  Avg:0.8948717948717951
  7/10 elapsed_time:0.00[sec]
-----8th generation results-----
  Min:0.6153846153846154
  Max:0.9230769230769231
  Avg:0.8974358974358977
  8/10 elapsed_time:0.00[sec]
-----9th generation results-----
  Min:0.7692307692307693
  Max:0.9230769230769231
  Avg:0.9000000000000002
  9/10 elapsed_time:0.00[sec]
-----10th generation results-----
  Min:0.8461538461538461
  Max:1.0
  Avg:0.9102564102564105
  10/10 elapsed_time:0.00[sec]

The best individual is[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

I was able to get an individual with all 1 elements.

Feature selection by genetic algorithm

I prepared the data and model by referring to here.

The data preparation is the following code.

import pandas as pd
import numpy as np
from sklearn.datasets import load_boston
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import StandardScaler


#Dataset loading
X = pd.DataFrame(load_boston().data, columns=load_boston().feature_names)
y = load_boston().target

#Added polynomial features
poly = PolynomialFeatures(2)
poly.fit(X)
X_poly = pd.DataFrame(poly.transform(X), columns=poly.get_feature_names(input_features=X.columns))

#Standardization
sc = StandardScaler()
X_sc = pd.DataFrame(sc.fit_transform(X), columns=X.columns)
X_poly_sc = pd.DataFrame(sc.fit_transform(X_poly), columns=X_poly.columns)

X_poly_sc is an increase in features using PolynomialFeatures to verify patterns with many features.

Model with the raw dataset.

from sklearn.linear_model import RidgeCV
from sklearn.model_selection import train_test_split

#As-is dataset
scores = []
for _ in range(30):
    X_train, X_test, y_train, y_test = train_test_split(X_sc, y, test_size=0.4)
    model = RidgeCV()
    model.fit(X_train, y_train)
    scores.append(model.score(X_test, y_test))

print(np.array(scores).mean())  # 0.70

A model with polynomial features added.

from sklearn.linear_model import RidgeCV
from sklearn.model_selection import train_test_split


#When polynomial features are added
scores = []
for _ in range(30):
    X_train, X_test, y_train, y_test = train_test_split(X_poly_sc, y, test_size=0.4)
    model = RidgeCV()
    model.fit(X_train, y_train)
    scores.append(model.score(X_test, y_test))

print(np.array(scores).mean())    # 0.82

Now, let's select features for this model using a genetic algorithm. It is the ʻevaluate_individual` method that modifies. Converts the individual chromosome to a Boolean value as input and specifies the column to use. After that, learning and score are calculated using the model. The score is returned as the evaluation value of the individual.

def evaluate_individual(individual):
    """Evaluation function.
    
    Parameters
    ----------
    individual : Individual
individual.
        
    Returns
    -------
    eval : float
Evaluation value.
    """
    use_cols = [bool(gene) for gene in individual.chromosome]
    X_temp = X_sc.iloc[:, use_cols]
    
    scores = []
    for _ in range(30):
        X_train, X_test, y_train, y_test = train_test_split(X_temp, y, test_size=0.4)
        model = RidgeCV()
        model.fit(X_train, y_train)
        scores.append(model.score(X_test, y_test))
    
    eval = float(np.array(scores).mean())
    return eval

I changed the parameters to the following and tried to execute.

#Chromosome length
CHROMOSOME_LENGTH = 13

#The size of the population
POPULATION_SIZE = 100

#Elite chromosome selection number
PICK_OUT_SIZE = 20

#Individual mutation probability
INDIVIDUAL_MUTATION_PROBABILITY = 0.3

#Gene mutation probability
GENE_MUTATION_PROBABILITY = 0.1

#Number of generations to repeat
ITERATION = 10

The result is as follows.

-----1st generation results-----
  Min:0.245482696210891
  Max:0.7062246093438559
  Avg:0.5643638813331334
  1/10 elapsed_time:13.21[sec]
-----Second generation results-----
  Min:0.28765890628509017
  Max:0.7175019664075553
  Avg:0.6611343782899052
  2/10 elapsed_time:14.07[sec]
-----3rd generation results-----
  Min:0.5958052127889627
  Max:0.7343341487237112
  Avg:0.6840346805288029
  3/10 elapsed_time:14.39[sec]
-----4th generation results-----
  Min:0.6011227398695212
  Max:0.7265364514547696
  Avg:0.694531099756538
  4/10 elapsed_time:11.29[sec]
-----5th generation results-----
  Min:0.6314510371602322
  Max:0.7249977461594102
  Avg:0.6938166370760438
  5/10 elapsed_time:11.72[sec]
-----6th generation results-----
  Min:0.6539907671434392
  Max:0.7256998515926862
  Avg:0.7042345770684423
  6/10 elapsed_time:11.44[sec]
-----7th generation results-----
  Min:0.6557998988298114
  Max:0.7273580445493621
  Avg:0.7009249865262361
  7/10 elapsed_time:9.64[sec]
-----8th generation results-----
  Min:0.6530159418050802
  Max:0.7250968150681534
  Avg:0.7044189020700958
  8/10 elapsed_time:9.90[sec]
-----9th generation results-----
  Min:0.6087336519329122
  Max:0.7316442169584539
  Avg:0.7008118423172378
  9/10 elapsed_time:9.64[sec]
-----10th generation results-----
  Min:0.6328245771251623
  Max:0.7244970729879131
  Avg:0.7034862249363725
  10/10 elapsed_time:13.06[sec]

The best individual is[1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1]

Since the score is 0.72, better feature selection is possible.

Classification

Classified so that it can be reused. Create it as an abstract class and inherit it to implement ʻevaluate_individual`.

"""
Written by Azunyan https://github.com/Azunyan1111/OneMax
"""
"""
Modified Copyright 2020 ground0state All Rights Reserved.
"""
import random
import time
from abc import ABCMeta, abstractmethod


class Individual():
    """individual.
    
    Parameters
    ----------
    chromosome : list of {0 or 1}
Chromosomes.
        
    evaluation : float
Evaluation.
    """
    chromosome = None
    evaluation = None

    def __init__(self, chromosome, evaluation):
        self.chromosome = chromosome
        self.evaluation = evaluation

        
class GaSolver(metaclass=ABCMeta):
    """Abstract class of genetic algorithm.
Method "evaluate" that outputs the evaluation value for the chromosome_"individual" requires implementation.

    Parameters
    ----------
    chromosome_length : int
Chromosome length.

    population_size : int
The size of the population.
        
    pick_out_size : int
Elite chromosome selection number.

    individual_mutation_probability : float
Individual mutation probability.
        
    gene_mutation_probability : float
Gene mutation probability.

    iteration : int
Number of generations to repeat.
    """
    
    def __init__(self, chromosome_length, population_size, pick_out_size,
                 individual_mutation_probability=0.3, gene_mutation_probability=0.1, iteration=1, verbose=True):        
        self.chromosome_length = chromosome_length
        self.population_size = population_size
        self.pick_out_size = pick_out_size
        self.individual_mutation_probability = individual_mutation_probability
        self.gene_mutation_probability = gene_mutation_probability
        self.iteration = iteration
        self.verbose = verbose
        self.history = None
    
    def _create_individual(self, length):
        """Returns the individual that generated and stored the random chromosome of the digit specified by the argument.

        Parameters
        ----------
        length : int
Chromosome length.

        Returns
        -------
        individual : Individual
individual.
        """
        individual = Individual([random.randint(0, 1) for i in range(length)], 0)
        return individual

    @abstractmethod
    def evaluate_individual(self, individual, X, y):
        """Evaluation function.

        Parameters
        ----------
        individual : Individual
individual.
        X : pandas.DataFrame
Explanatory variable.
        y : pandas.DataFrame
Objective variable.

        Returns
        -------
        eval : float
Evaluation value.
        """
        raise NotImplementedError()

    def _extract_elites(self, population, num):
        """Choice function.

        Parameters
        ----------
        population : list of Individual
Group.
        num : int
Number of individual selections.

        Returns
        -------
        elites : list of Individual
Group that has undergone selection processing.
        """        
        #Sort the ratings of the current generation population in descending order
        sort_result = sorted(population, reverse=True, key=lambda individual: individual.evaluation)

        #Extract certain tops
        elites = sort_result[:num]
        return elites

    def _crossover(self, individual1, individual2, chromosome_length):
        """Cross function.
Perform two-point crossing.

        Parameters
        ----------
        individual1 : Individual
Crossing individuals 1.
        individual2 : Individual
Crossing individuals 2.
        chromosome_length : int
Chromosome length.

        Returns
        -------
        offsprings : list of Individual
Two grandchildren.
        """

        #Set two points to be replaced
        cross_one = random.randint(0, chromosome_length)
        cross_second = random.randint(cross_one, chromosome_length)

        #Extract the gene
        one = individual1.chromosome
        second = individual2.chromosome

        #Cross
        progeny_one = one[:cross_one] + second[cross_one:cross_second] + one[cross_second:]
        progeny_second = second[:cross_one] + one[cross_one:cross_second] + second[cross_second:]

        #descendants
        offsprings = [Individual(progeny_one, 0), Individual(progeny_second, 0)]
        return offsprings

    def _create_next_generation(self, population, elites, offsprings):
        """Performs generation change processing.

        Parameters
        ----------
        population : list of Individual
Current generation population.
        elites : list of Individual
Current generation elite group.
        offsprings : list of Individual
Current generation offspring group.

        Returns
        -------
        next_generation_population : list of Individual
Next-generation population.
        """
        #Sort the ratings of the current generation population in ascending order
        next_generation_population = sorted(population, reverse=False, key=lambda individual: individual.evaluation)

        #Remove the sum of the elite and offspring populations you add
        next_generation_population = next_generation_population[len(elites)+len(offsprings):]

        #Add elite and offspring groups to the next generation
        next_generation_population.extend(elites)
        next_generation_population.extend(offsprings)
        return next_generation_population

    def _mutation(self, population, induvidual__mutation_probability, gene__mutation_probability):
        """Mutation function.

        Parameters
        ----------
        population : list of Individual
Group.
        induvidual__mutation_probability : float in [0, 1]
Individual mutation probability.
        gene__mutation_probability : float in [0, 1]
Gene mutation probability.

        Returns
        -------
        new_population : list of Individual
Mutated population.
        """
        new_population = []
        for individual in population:
            #Mutation occurs with a certain probability for an individual
            if induvidual__mutation_probability > random.random():
                new_chromosome = []
                for gene in individual.chromosome:
                    #Mutations occur in each individual genetic information
                    if gene__mutation_probability > random.random():
                        new_chromosome.append(random.randint(0, 1))
                    else:
                        new_chromosome.append(gene)

                individual.chromosome = new_chromosome
                new_population.append(individual)
            else:
                new_population.append(individual)

        return new_population
    
    def solve(self, X, y):
        """Main class of genetic algorithms.

        Returns
        -------
        list of {0 or 1}
Chromosomes of the best individuals.
        """
        self.history = {"Min":[], "Max":[], "Avg":[], "BestChromosome":[]}
        
        #Initialize the population of the current generation
        current_generation_population = [self._create_individual(self.chromosome_length) for i in range(self.population_size)]
        
        #Evaluate individuals in the current generation population
        for individual in current_generation_population:
            individual.evaluation = self.evaluate_individual(individual, X, y)

        for count in range(self.iteration):
            #Start time of each loop
            start = time.time()

            #Select an elite individual
            elites = self._extract_elites(current_generation_population, self.pick_out_size)

            #Cross the elite genes and store them in the list
            offsprings = []
            for i in range(0, self.pick_out_size-1):
                offsprings.extend(self._crossover(elites[i], elites[i+1], self.chromosome_length))

            #Create next-generation populations from current generations, elite populations, and offspring populations
            next_generation_population = self._create_next_generation(current_generation_population, elites, offsprings)

            #Mutate all individuals in the next-generation population.
            next_generation_population = self._mutation(next_generation_population,
                                                  self.individual_mutation_probability,
                                                  self.gene_mutation_probability)
            
            #Evaluate individuals in the current generation population
            for individual in current_generation_population:
                individual.evaluation = self.evaluate_individual(individual, X, y)

            #Evolutionary computation of one generation is completed. Move on to evaluation

            #Arrange the evaluation values of each individual.
            fits = [individual.evaluation for individual in current_generation_population]
            
            #Take out the individual with the best evaluation value
            best_individual = self._extract_elites(current_generation_population, 1)
            best_chromosome = best_individual[0].chromosome

            #Evaluate evolutionary results
            min_val = min(fits)
            max_val = max(fits)
            avg_val = sum(fits) / len(fits)

            #Outputs the evolution results of the current generation
            if self.verbose:
                print("-----No.{}Generational results-----".format(count+1))
                print("  Min:{}".format(min_val))
                print("  Max:{}".format(max_val))
                print("  Avg:{}".format(avg_val))
                
            #history creation
            self.history["Min"].append(min_val)
            self.history["Max"].append(max_val)
            self.history["Avg"].append(avg_val)
            self.history["BestChromosome"].append(best_chromosome)

            #Swap the current generation with the next generation
            current_generation_population = next_generation_population
            
            #measurement of time
            elapsed_time = time.time() - start
            print ("  {}/{} elapsed_time:{:.2f}".format(count+1, self.iteration, elapsed_time) + "[sec]")

        #Final result output
        if self.verbose:
            print("")  #new line
            print("The best individual is{}".format(elites[0].chromosome))
            
        return self.history

Implement the machine learning model you want to evaluate in ʻevaluate_individual`.

from sklearn.linear_model import RidgeCV
from sklearn.model_selection import train_test_split


class GaSolverImpl(GaSolver):
    
    # override
    def evaluate_individual(self, individual, X, y):
        use_cols = [bool(gene) for gene in individual.chromosome]
        X_temp = X.iloc[:, use_cols]

        scores = []
        for _ in range(30):
            X_train, X_test, y_train, y_test = train_test_split(X_temp, y, test_size=0.4)
            model = RidgeCV()
            model.fit(X_train, y_train)
            scores.append(model.score(X_test, y_test))

        eval = float(np.array(scores).mean())
        return eval

Create an instance of the implementation class, pass the data and execute it. The solve method returns the history history.

solver = GaSolverImpl(
    chromosome_length = X_poly_sc.shape[1], 
    population_size = 50, 
    pick_out_size = 10,
    individual_mutation_probability = 0.3,
    gene_mutation_probability = 0.1,
    iteration = 50,
    verbose = True
)

history = solver.solve(X_poly_sc, y)

The execution result is as follows.

-----1st generation results-----
  Min:0.7248417700796147
  Max:0.8360838319205105
  Avg:0.7927103625892467
  1/50 elapsed_time:6.13[sec]
-----Second generation results-----
  Min:0.7350424889460248
  Max:0.8264758137896353
  Avg:0.8114411035733131
  2/50 elapsed_time:10.81[sec]
-----3rd generation results-----
  Min:0.7902116792529935
  Max:0.8286229243915363
  Avg:0.8125974889978004
  3/50 elapsed_time:8.20[sec]
-----4th generation results-----
  Min:0.773199874021567
  Max:0.8312887517624212
  Avg:0.810950812639705
  4/50 elapsed_time:7.56[sec]
-----5th generation results-----
  Min:0.768479730905661
  Max:0.8386114466226944
  Avg:0.8076230726252596
  5/50 elapsed_time:8.13[sec]
-----6th generation results-----
  Min:0.7797249579245809
  Max:0.8319768049107215
  Avg:0.8138790949911054
  6/50 elapsed_time:9.00[sec]
-----7th generation results-----
  Min:0.7971344524880782
  Max:0.8333411281001641
  Avg:0.8168863897838727
  7/50 elapsed_time:7.56[sec]
-----8th generation results-----
  Min:0.7709812458007903
  Max:0.8316092177782253
  Avg:0.8082876757394714
  8/50 elapsed_time:7.96[sec]
-----9th generation results-----
  Min:0.7459891729563418
  Max:0.8322393628831635
  Avg:0.8159389943969992
  9/50 elapsed_time:8.77[sec]
-----10th generation results-----
  Min:0.7538656919599587
  Max:0.8254541549046537
  Avg:0.8034195187548075
  10/50 elapsed_time:8.99[sec]
-----11th generation results-----
  Min:0.8046900766607942
  Max:0.8379618406470278
  Avg:0.8217659811828382
  11/50 elapsed_time:8.60[sec]
-----12th generation results-----
  Min:0.8020625272756005
  Max:0.8356958927515973
  Avg:0.8132506462797608
  12/50 elapsed_time:8.31[sec]
-----13th generation results-----
  Min:0.7442093041785434
  Max:0.826166208838109
  Avg:0.7693376466706999
  13/50 elapsed_time:9.22[sec]
-----14th generation results-----
  Min:0.80133807286147
  Max:0.8264198880246336
  Avg:0.8085481113173225
  14/50 elapsed_time:8.08[sec]
-----15th generation results-----
  Min:0.7316094852550766
  Max:0.8139831643344952
  Avg:0.7929373870389733
  15/50 elapsed_time:8.92[sec]
-----16th generation results-----
  Min:0.7955982071682629
  Max:0.8210496822695305
  Avg:0.8134173712784526
  16/50 elapsed_time:9.72[sec]
-----17th generation results-----
  Min:0.758489267352653
  Max:0.826441026953439
  Avg:0.7773437348210647
  17/50 elapsed_time:8.58[sec]
-----18th generation results-----
  Min:0.7687388062022248
  Max:0.8211801466346264
  Avg:0.7826663042340634
  18/50 elapsed_time:6.94[sec]
-----19th generation results-----
  Min:0.7429453738843712
  Max:0.794799782442768
  Avg:0.7525262014670999
  19/50 elapsed_time:8.35[sec]
-----20th generation results-----
  Min:0.7059056866516289
  Max:0.8115968792777923
  Avg:0.7941420197838582
  20/50 elapsed_time:7.01[sec]
-----21st generation results-----
  Min:0.7035195424104084
  Max:0.8339769569079513
  Avg:0.785429874209423
  21/50 elapsed_time:8.84[sec]
-----22nd generation results-----
  Min:0.7605334574905934
  Max:0.8178769887665864
  Avg:0.7764313614722025
  22/50 elapsed_time:8.89[sec]
-----23rd generation results-----
  Min:0.7622888571603964
  Max:0.8125955330567856
  Avg:0.7761008854264979
  23/50 elapsed_time:8.47[sec]
-----24th generation results-----
  Min:0.7325862134323571
  Max:0.7781021993458462
  Avg:0.76629374412332
  24/50 elapsed_time:6.80[sec]
-----25th generation results-----
  Min:0.7155008056263605
  Max:0.7770200781667415
  Avg:0.7679494414264083
  25/50 elapsed_time:6.34[sec]
-----26th generation results-----
  Min:0.7435193687961383
  Max:0.8178098302473983
  Avg:0.8025839605868198
  26/50 elapsed_time:7.55[sec]
-----Results of the 27th generation-----
  Min:0.757023831644299
  Max:0.8134233524435134
  Avg:0.7987707913780304
  27/50 elapsed_time:8.24[sec]
-----28th generation results-----
  Min:0.7731968991993663
  Max:0.8307874217208041
  Avg:0.7886999734804412
  28/50 elapsed_time:6.93[sec]
-----29th generation results-----
  Min:0.7918044164374493
  Max:0.8258234982562584
  Avg:0.8092356291245499
  29/50 elapsed_time:6.45[sec]
-----30th generation results-----
  Min:0.7742914329017841
  Max:0.8170916314535998
  Avg:0.8057764064558626
  30/50 elapsed_time:6.46[sec]
-----31st generation results-----
  Min:0.7900272740547029
  Max:0.8252185280503214
  Avg:0.8121724282164997
  31/50 elapsed_time:6.87[sec]
-----32nd generation results-----
  Min:0.7668694386968217
  Max:0.8231354707898234
  Avg:0.8170271080711664
  32/50 elapsed_time:7.61[sec]
-----33rd generation results-----
  Min:0.7721459013264073
  Max:0.8365223852672053
  Avg:0.82567433930934
  33/50 elapsed_time:8.28[sec]
-----34th generation results-----
  Min:0.802896605790934
  Max:0.8367820565860135
  Avg:0.8256706142219095
  34/50 elapsed_time:7.94[sec]
-----35th generation results-----
  Min:0.8188038196577934
  Max:0.8388260026966802
  Avg:0.8358101024561487
  35/50 elapsed_time:7.64[sec]
-----36th generation results-----
  Min:0.7887209549961678
  Max:0.8386551764887261
  Avg:0.8301462683188676
  36/50 elapsed_time:8.13[sec]
-----37th generation results-----
  Min:0.7862123272076996
  Max:0.8405895787926129
  Avg:0.8165090312639174
  37/50 elapsed_time:7.54[sec]
-----38th generation results-----
  Min:0.79041640507099
  Max:0.8389789987982965
  Avg:0.8075935438809548
  38/50 elapsed_time:8.58[sec]
-----39th generation results-----
  Min:0.7632897869020304
  Max:0.8249959874282974
  Avg:0.7783194384843993
  39/50 elapsed_time:8.18[sec]
-----40th generation results-----
  Min:0.7391820233337305
  Max:0.8140492870179213
  Avg:0.7954486450055553
  40/50 elapsed_time:6.36[sec]
-----41st generation results-----
  Min:0.7085099265464342
  Max:0.7981244256568432
  Avg:0.7831723305042879
  41/50 elapsed_time:7.90[sec]
-----42nd generation results-----
  Min:0.7826056505944214
  Max:0.8327777219420097
  Avg:0.8064707164336307
  42/50 elapsed_time:7.53[sec]
-----43rd generation results-----
  Min:0.7799209160785368
  Max:0.8183673115100479
  Avg:0.7992172395182555
  43/50 elapsed_time:6.74[sec]
-----44th generation results-----
  Min:0.756001056689909
  Max:0.8338583079593664
  Avg:0.8051445406627477
  44/50 elapsed_time:6.31[sec]
-----45th generation results-----
  Min:0.7755735607344747
  Max:0.8283597660188781
  Avg:0.7882919431369523
  45/50 elapsed_time:6.52[sec]
-----46th generation results-----
  Min:0.7766070559704219
  Max:0.8165316562327392
  Avg:0.8106111873738964
  46/50 elapsed_time:7.22[sec]
-----47th generation results-----
  Min:0.7780606007516856
  Max:0.8084622225234689
  Avg:0.7942400594914705
  47/50 elapsed_time:9.72[sec]
-----48th Generation Results-----
  Min:0.7745173603676726
  Max:0.8363078519583506
  Avg:0.8206202750563127
  48/50 elapsed_time:10.67[sec]
-----49th generation results-----
  Min:0.7800301936781145
  Max:0.8368475790583294
  Avg:0.8222375502197947
  49/50 elapsed_time:7.54[sec]
-----50th generation results-----
  Min:0.8077617917763787
  Max:0.841354566380394
  Avg:0.8147771424682558
  50/50 elapsed_time:6.78[sec]

The best individual is[1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1]

The history looks like this:

df = pd.DataFrame(history)
スクリーンショット 2020-01-24 21.40.11.png

The column to be used in the end should be sorted by Max column, try some from the top, and adopt it considering the search of hyperparameters.

df.sort_values(["Max"], ascending=False)
スクリーンショット 2020-01-24 21.45.42.png

in conclusion

I tried to select features using a genetic algorithm. There seems to be a variety of genetic algorithms, and the [DEAP](https: //) used on the here site You may also try using a library such as github.com/DEAP/deap).

Recommended Posts

Feature selection by genetic algorithm
Feature selection by sklearn.feature_selection
Feature selection by Null importances
Feature Selection Datasets
Genetic algorithm in python
Organized feature selection using sklearn
Linear programming by Karmarkar's algorithm
Algorithm generation automation using genetic algorithms
[Translation] scikit-learn 0.18 User Guide 1.13 Feature selection
Predictive Power Score for feature selection
Support vector regression and feature selection
Feature generation with pandas group by