Symbolic Regression This is a continuation of the introduction of Python Evolutionary Computation Library Deap posted last time. This time we'll look at the Symbolic Regression in the Genetic Programming (GP) Example. GP allows the genotype of the genetic algorithm (GA) to be handled by a tree structure or a graph structure, and Symbolic Regression is a typical problem of GP. The specific content is the problem of identifying the following quartic function in the range of $ [-1, 1] $.

f(x)=x^4+x^3+x^2+x

The merit function is expressed by the error between the estimated equation $ prog (x) $ and the true equation $ f (x) $ at 20 points between -1 and 1.

\sum_{i=1}^{20} |prog(x_i) - f(x_i)|^2

Let's actually look at the contents of the Example. First, import the module.

import operator
import math
import random

import numpy

from deap import algorithms
from deap import base
from deap import creator
from deap import tools
from deap import gp

Next, create a set of primitives that will be each node of the tree structure. Here, basic arithmetic operations, trigonometric functions, and temporary random numbers are included in the set.

# Define new functions
def safeDiv(left, right):
    try:
        return left / right
    except ZeroDivisionError:
        return 0

pset = gp.PrimitiveSet("MAIN", 1)
pset.addPrimitive(operator.add, 2)
pset.addPrimitive(operator.sub, 2)
pset.addPrimitive(operator.mul, 2)
pset.addPrimitive(safeDiv, 2)
pset.addPrimitive(operator.neg, 1)
pset.addPrimitive(math.cos, 1)
pset.addPrimitive(math.sin, 1)
pset.addEphemeralConstant("rand101", lambda: random.randint(-1,1))
pset.renameArguments(ARG0='x')

In order to prevent the error due to zero division, the division is newly defined and added to the primitive, and the others use the functions of the operator module and math module of python. The second argument of addPrimitive indicates the number of arguments for the primitive function. addEphemeralConstant is used when using a value generated from a function such as a random number instead of a constant at the end of a node. Here you can see that we are defining a random number that will generate one of -1,0,1. I don't think it will be chosen very much in this issue, but ... The second argument of the PrimitiveSet indicates the number of program inputs. This time there is only one, it is named "ARG0" by default, but I renamed it to "x" in renameArguments.

creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
creator.create("Individual", gp.PrimitiveTree, fitness=creator.FitnessMin)

Next, as in the previous GA example, set the minimization problem and the individual type with ** creator **.


toolbox = base.Toolbox()
toolbox.register("expr", gp.genHalfAndHalf, pset=pset, min_=1, max_=2)
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.expr)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("compile", gp.compile, pset=pset)

def evalSymbReg(individual, points):
    #Conversion from tree representation to function
    func = toolbox.compile(expr=individual)
    #Calculation of the mean square error between the estimation formula and the true formula
    sqerrors = ((func(x) - x**4 - x**3 - x**2 - x)**2 for x in points)
    return math.fsum(sqerrors) / len(points),

toolbox.register("evaluate", evalSymbReg, points=[x/10. for x in range(-10,10)])
toolbox.register("select", tools.selTournament, tournsize=3)
toolbox.register("mate", gp.cxOnePoint)
toolbox.register("expr_mut", gp.genFull, min_=0, max_=2)
toolbox.register("mutate", gp.mutUniform, expr=toolbox.expr_mut, pset=pset)

Next, using ** toolbox **, we will create individual and population generation methods, evaluation functions, intersections, mutations, selection methods, etc. gp.genHalfAndHalf is a function that creates trees, but min_ and max_ specify the minimum and maximum depths of the tree, and genGrow (generation of trees where the depth of each leaf node can be different) It is designed to do genFull (generation of trees with the same depth of each leaf node) in half. gp.compile creates a function that can actually be executed from an individual. At the end, the mutation is specified by adding a new subtree to the node.

def main():
    random.seed(318)

    pop = toolbox.population(n=300)
    hof = tools.HallOfFame(1)
    
    stats_fit = tools.Statistics(lambda ind: ind.fitness.values)
    stats_size = tools.Statistics(len)
    mstats = tools.MultiStatistics(fitness=stats_fit, size=stats_size)
    mstats.register("avg", numpy.mean)
    mstats.register("std", numpy.std)
    mstats.register("min", numpy.min)
    mstats.register("max", numpy.max)

    pop, log = algorithms.eaSimple(pop, toolbox, 0.5, 0.1, 40, stats=mstats,
                                   halloffame=hof, verbose=True)
    #Display log
    return pop, log, hof
if __name__ == "__main__":
    main()

Next, create the main function and execute it. First, set the statistical information you want to calculate in order to obtain the statistical information, then generate the initial population and perform evolutionary computation with eaSimple. When executed, the statistical information for each generation will be output as shown below.

   	      	                fitness                	              size             
   	      	---------------------------------------	-------------------------------
gen	nevals	avg    	max    	min     	std    	avg    	max	min	std    
0  	300   	2.36554	59.2093	0.165572	4.63581	3.69667	7  	2  	1.61389
1  	146   	1.07596	10.05  	0.165572	0.820853	3.85333	13 	1  	1.79401
2  	169   	0.894383	6.3679 	0.165572	0.712427	4.2    	13 	1  	2.06398
3  	167   	0.843668	9.6327 	0.165572	0.860971	4.73333	13 	1  	2.36549
4  	158   	0.790841	16.4823	0.165572	1.32922 	5.02   	13 	1  	2.37338
5  	157   	0.836829	67.637 	0.165572	3.97761 	5.59667	13 	1  	2.20771
6  	179   	0.475982	3.53043	0.150643	0.505782	6.01   	14 	1  	2.02564
7  	176   	0.404081	2.54124	0.150643	0.431554	6.42   	13 	1  	2.05352
8  	164   	0.39734 	2.99872	0.150643	0.424689	6.60333	14 	3  	2.10063
9  	161   	0.402689	13.5996	0.150643	0.860105	6.61333	13 	3  	2.05519
10 	148   	0.392393	2.9829 	0.103868	0.445793	6.74333	15 	1  	2.39669
11 	155   	0.39596 	7.28126	0.0416322	0.578673	7.26333	15 	1  	2.53784
12 	163   	0.484725	9.45796	0.00925063	0.733674	8.16   	17 	1  	2.74489
13 	155   	0.478033	11.0636	0.0158757 	0.835315	8.72333	21 	1  	3.04633
14 	184   	0.46447 	2.65913	0.0158757 	0.562739	9.84333	23 	1  	3.17681
15 	161   	0.446362	5.74933	0.0158757 	0.700987	10.5367	23 	2  	3.29676
16 	187   	0.514291	19.6728	0.013838  	1.44413 	11.8067	25 	1  	4.25237
17 	178   	0.357693	3.69339	0.00925063	0.456012	12.7767	29 	1  	5.3672 
18 	163   	0.377407	14.2468	0.00925063	0.949661	13.4733	35 	1  	5.67885
19 	155   	0.280784	9.55288	0.00925063	0.691295	15.2333	36 	2  	5.7884 
20 	165   	0.247941	2.89093	0.00925063	0.416445	15.63  	31 	3  	5.66508
21 	160   	0.229175	2.9329 	0.00182347	0.406363	16.5033	37 	2  	6.44593
22 	165   	0.183025	3.07225	0.00182347	0.333659	16.99  	32 	2  	5.77205
23 	156   	0.22139 	2.49124	0.00182347	0.407663	17.5   	42 	1  	6.7382 
24 	167   	0.168575	1.98247	5.12297e-33	0.272764	17.48  	43 	3  	6.21902
25 	166   	0.177509	2.3179 	5.12297e-33	0.330852	16.9433	36 	1  	6.6908 
26 	152   	0.187417	2.75742	5.12297e-33	0.382165	17.2267	47 	3  	6.21734
27 	169   	0.216263	3.37474	5.12297e-33	0.419851	17.1967	47 	3  	6.00483
28 	176   	0.183346	2.75742	5.12297e-33	0.295578	16.7733	42 	3  	5.85052
29 	159   	0.167077	2.90958	5.12297e-33	0.333926	16.59  	45 	2  	5.75169
30 	171   	0.192057	2.75742	5.12297e-33	0.341461	16.2267	53 	3  	5.48895
31 	209   	0.279078	3.04517	5.12297e-33	0.455884	15.84  	32 	3  	5.60188
32 	161   	0.291415	19.9536	5.12297e-33	1.22461 	16.0267	35 	3  	5.45276
33 	157   	0.16533 	2.54124	5.12297e-33	0.316613	15.5033	33 	2  	5.08232
34 	159   	0.164494	2.99681	5.12297e-33	0.316094	15.4567	33 	1  	4.99347
35 	157   	0.158183	2.9829 	5.12297e-33	0.309021	15.3133	34 	2  	4.54113
36 	172   	0.203184	3.00665	5.12297e-33	0.380584	15.12  	29 	1  	4.70804
37 	181   	0.251027	4.66987	5.12297e-33	0.483365	15.18  	36 	1  	5.6587 
38 	141   	0.249193	12.8906	5.12297e-33	0.848809	15.1633	36 	1  	5.38114
39 	158   	0.188209	2.33071	5.12297e-33	0.346345	15.1967	32 	1  	4.82887
40 	165   	0.220244	2.3179 	5.12297e-33	0.381864	15.44  	33 	2  	5.70787

    expr = toolbox.individual()
    nodes, edges, labels = gp.graph(expr)
    import matplotlib.pyplot as plt
    import networkx as nx

    g = nx.Graph()
    g.add_nodes_from(nodes)
    g.add_edges_from(edges)
    pos = nx.graphviz_layout(g, prog="dot")

    nx.draw_networkx_nodes(g, pos)
    nx.draw_networkx_edges(g, pos)
    nx.draw_networkx_labels(g, pos, labels)
    plt.show()

Finally, let's try drawing the finally optimized tree. Networkx and matplotlib are used for drawing. The following is the structure of the tree obtained by this optimization.

Let's make a graph to see if a function that is close to the actual one is really obtained. Red is the true function and green is the estimation result, but it's not good. It may be improved by adding other tree depths and primitives, but for now, let's leave it here.

reference

Symbolic Regression Example: http://deap.gel.ulaval.ca/doc/dev/examples/gp_symbreg.html

Python Evolutionary Computation Library Deap (2)

reference