Symbolic Regression This is a continuation of the introduction of Python Evolutionary Computation Library Deap posted last time. This time we'll look at the Symbolic Regression in the Genetic Programming (GP) Example. GP allows the genotype of the genetic algorithm (GA) to be handled by a tree structure or a graph structure, and Symbolic Regression is a typical problem of GP. The specific content is the problem of identifying the following quartic function in the range of $ [-1, 1] $.
f(x)=x^4+x^3+x^2+x
The merit function is expressed by the error between the estimated equation $ prog (x) $ and the true equation $ f (x) $ at 20 points between -1 and 1.
\sum_{i=1}^{20} |prog(x_i) - f(x_i)|^2
Let's actually look at the contents of the Example. First, import the module.
import operator
import math
import random
import numpy
from deap import algorithms
from deap import base
from deap import creator
from deap import tools
from deap import gp
Next, create a set of primitives that will be each node of the tree structure. Here, basic arithmetic operations, trigonometric functions, and temporary random numbers are included in the set.
# Define new functions
def safeDiv(left, right):
try:
return left / right
except ZeroDivisionError:
return 0
pset = gp.PrimitiveSet("MAIN", 1)
pset.addPrimitive(operator.add, 2)
pset.addPrimitive(operator.sub, 2)
pset.addPrimitive(operator.mul, 2)
pset.addPrimitive(safeDiv, 2)
pset.addPrimitive(operator.neg, 1)
pset.addPrimitive(math.cos, 1)
pset.addPrimitive(math.sin, 1)
pset.addEphemeralConstant("rand101", lambda: random.randint(-1,1))
pset.renameArguments(ARG0='x')
In order to prevent the error due to zero division, the division is newly defined and added to the primitive, and the others use the functions of the operator module and math module of python. The second argument of addPrimitive indicates the number of arguments for the primitive function. addEphemeralConstant is used when using a value generated from a function such as a random number instead of a constant at the end of a node. Here you can see that we are defining a random number that will generate one of -1,0,1. I don't think it will be chosen very much in this issue, but ... The second argument of the PrimitiveSet indicates the number of program inputs. This time there is only one, it is named "ARG0" by default, but I renamed it to "x" in renameArguments.
creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
creator.create("Individual", gp.PrimitiveTree, fitness=creator.FitnessMin)
Next, as in the previous GA example, set the minimization problem and the individual type with ** creator **.
toolbox = base.Toolbox()
toolbox.register("expr", gp.genHalfAndHalf, pset=pset, min_=1, max_=2)
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.expr)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("compile", gp.compile, pset=pset)
def evalSymbReg(individual, points):
#Conversion from tree representation to function
func = toolbox.compile(expr=individual)
#Calculation of the mean square error between the estimation formula and the true formula
sqerrors = ((func(x) - x**4 - x**3 - x**2 - x)**2 for x in points)
return math.fsum(sqerrors) / len(points),
toolbox.register("evaluate", evalSymbReg, points=[x/10. for x in range(-10,10)])
toolbox.register("select", tools.selTournament, tournsize=3)
toolbox.register("mate", gp.cxOnePoint)
toolbox.register("expr_mut", gp.genFull, min_=0, max_=2)
toolbox.register("mutate", gp.mutUniform, expr=toolbox.expr_mut, pset=pset)
Next, using ** toolbox **, we will create individual and population generation methods, evaluation functions, intersections, mutations, selection methods, etc. gp.genHalfAndHalf is a function that creates trees, but min_ and max_ specify the minimum and maximum depths of the tree, and genGrow (generation of trees where the depth of each leaf node can be different) It is designed to do genFull (generation of trees with the same depth of each leaf node) in half. gp.compile creates a function that can actually be executed from an individual. At the end, the mutation is specified by adding a new subtree to the node.
def main():
random.seed(318)
pop = toolbox.population(n=300)
hof = tools.HallOfFame(1)
stats_fit = tools.Statistics(lambda ind: ind.fitness.values)
stats_size = tools.Statistics(len)
mstats = tools.MultiStatistics(fitness=stats_fit, size=stats_size)
mstats.register("avg", numpy.mean)
mstats.register("std", numpy.std)
mstats.register("min", numpy.min)
mstats.register("max", numpy.max)
pop, log = algorithms.eaSimple(pop, toolbox, 0.5, 0.1, 40, stats=mstats,
halloffame=hof, verbose=True)
#Display log
return pop, log, hof
if __name__ == "__main__":
main()
Next, create the main function and execute it. First, set the statistical information you want to calculate in order to obtain the statistical information, then generate the initial population and perform evolutionary computation with eaSimple. When executed, the statistical information for each generation will be output as shown below.
fitness size
--------------------------------------- -------------------------------
gen nevals avg max min std avg max min std
0 300 2.36554 59.2093 0.165572 4.63581 3.69667 7 2 1.61389
1 146 1.07596 10.05 0.165572 0.820853 3.85333 13 1 1.79401
2 169 0.894383 6.3679 0.165572 0.712427 4.2 13 1 2.06398
3 167 0.843668 9.6327 0.165572 0.860971 4.73333 13 1 2.36549
4 158 0.790841 16.4823 0.165572 1.32922 5.02 13 1 2.37338
5 157 0.836829 67.637 0.165572 3.97761 5.59667 13 1 2.20771
6 179 0.475982 3.53043 0.150643 0.505782 6.01 14 1 2.02564
7 176 0.404081 2.54124 0.150643 0.431554 6.42 13 1 2.05352
8 164 0.39734 2.99872 0.150643 0.424689 6.60333 14 3 2.10063
9 161 0.402689 13.5996 0.150643 0.860105 6.61333 13 3 2.05519
10 148 0.392393 2.9829 0.103868 0.445793 6.74333 15 1 2.39669
11 155 0.39596 7.28126 0.0416322 0.578673 7.26333 15 1 2.53784
12 163 0.484725 9.45796 0.00925063 0.733674 8.16 17 1 2.74489
13 155 0.478033 11.0636 0.0158757 0.835315 8.72333 21 1 3.04633
14 184 0.46447 2.65913 0.0158757 0.562739 9.84333 23 1 3.17681
15 161 0.446362 5.74933 0.0158757 0.700987 10.5367 23 2 3.29676
16 187 0.514291 19.6728 0.013838 1.44413 11.8067 25 1 4.25237
17 178 0.357693 3.69339 0.00925063 0.456012 12.7767 29 1 5.3672
18 163 0.377407 14.2468 0.00925063 0.949661 13.4733 35 1 5.67885
19 155 0.280784 9.55288 0.00925063 0.691295 15.2333 36 2 5.7884
20 165 0.247941 2.89093 0.00925063 0.416445 15.63 31 3 5.66508
21 160 0.229175 2.9329 0.00182347 0.406363 16.5033 37 2 6.44593
22 165 0.183025 3.07225 0.00182347 0.333659 16.99 32 2 5.77205
23 156 0.22139 2.49124 0.00182347 0.407663 17.5 42 1 6.7382
24 167 0.168575 1.98247 5.12297e-33 0.272764 17.48 43 3 6.21902
25 166 0.177509 2.3179 5.12297e-33 0.330852 16.9433 36 1 6.6908
26 152 0.187417 2.75742 5.12297e-33 0.382165 17.2267 47 3 6.21734
27 169 0.216263 3.37474 5.12297e-33 0.419851 17.1967 47 3 6.00483
28 176 0.183346 2.75742 5.12297e-33 0.295578 16.7733 42 3 5.85052
29 159 0.167077 2.90958 5.12297e-33 0.333926 16.59 45 2 5.75169
30 171 0.192057 2.75742 5.12297e-33 0.341461 16.2267 53 3 5.48895
31 209 0.279078 3.04517 5.12297e-33 0.455884 15.84 32 3 5.60188
32 161 0.291415 19.9536 5.12297e-33 1.22461 16.0267 35 3 5.45276
33 157 0.16533 2.54124 5.12297e-33 0.316613 15.5033 33 2 5.08232
34 159 0.164494 2.99681 5.12297e-33 0.316094 15.4567 33 1 4.99347
35 157 0.158183 2.9829 5.12297e-33 0.309021 15.3133 34 2 4.54113
36 172 0.203184 3.00665 5.12297e-33 0.380584 15.12 29 1 4.70804
37 181 0.251027 4.66987 5.12297e-33 0.483365 15.18 36 1 5.6587
38 141 0.249193 12.8906 5.12297e-33 0.848809 15.1633 36 1 5.38114
39 158 0.188209 2.33071 5.12297e-33 0.346345 15.1967 32 1 4.82887
40 165 0.220244 2.3179 5.12297e-33 0.381864 15.44 33 2 5.70787
expr = toolbox.individual()
nodes, edges, labels = gp.graph(expr)
import matplotlib.pyplot as plt
import networkx as nx
g = nx.Graph()
g.add_nodes_from(nodes)
g.add_edges_from(edges)
pos = nx.graphviz_layout(g, prog="dot")
nx.draw_networkx_nodes(g, pos)
nx.draw_networkx_edges(g, pos)
nx.draw_networkx_labels(g, pos, labels)
plt.show()
Finally, let's try drawing the finally optimized tree. Networkx and matplotlib are used for drawing. The following is the structure of the tree obtained by this optimization.
Let's make a graph to see if a function that is close to the actual one is really obtained. Red is the true function and green is the estimation result, but it's not good. It may be improved by adding other tree depths and primitives, but for now, let's leave it here.
Symbolic Regression Example: http://deap.gel.ulaval.ca/doc/dev/examples/gp_symbreg.html
Recommended Posts