With a Bayesian optimization library, In fact, I tried to see if it could be optimized for a good feeling.
The library used is "GPyOpt".
The following books were used as references for the theory. "Gauss process and machine learning" (Author: Daichi Mochihashi, Seisei Ohba)
Create an objective function and find its minimum value (within the specified definition area) by Bayesian optimization.
The detailed settings are as follows.
--Bayesian Optimization Library: GPyOpt (.methods.BayesianOptimization)
--Objective function:
--Bayesian optimization A method of finding the minimum value (maximum value) of an objective function using Gaussian process regression. For the prediction (probability distribution) obtained from Gaussian process regression The acquisition function is applied, and the point where the function value becomes the maximum is searched as the next point.
--Gaussian process regression A type of Bayesian inference. The objective function value $ f (x) $ for each input point ($ x $) is regarded as a random variable one by one. Consider $ (f (x_ {1}), f (x_ {2}),…, f (x_ {n}),…) $ as the multivariate normal distribution $ N (μ, Σ) $.
Acquired input points
--Kernel In Gaussian process regression A function for specifying each component of the variance-covariance matrix $ Σ $ of $ (f (x_ {1}), f (x_ {2}),…, f (x_ {n}),…) $. The $ (i, j) $ component, that is, the covariance of $ f (x_ {i}) $ and $ f (x_ {j}) $ depends on $ x_ {i} $ and $ x_ {j} $ Defined by the function $ k (x_ {i}, x_ {j}) $ The function $ k $ is called the "kernel". There are several commonly used kernels, As a basic property "If the input points $ x_ {i}, x_ {j} $ are close, then $ f (x_ {i}), f (x_ {j}) $ are also close."
#[document] https://gpyopt.readthedocs.io/en/latest/index.html
#[Source] https://github.com/SheffieldML/GPyOpt/blob/master/GPyOpt/methods/bayesian_optimization.py
#pip install Gpyopt
import GPyOpt
import matplotlib.pyplot as plt
import numpy as np
#Objective function definition
def f(x):
y = (x-300)*(x-200)*(x-15)*(x-5)*(x+6)*(x+10)*(x+100)
return y
#Domain definition
xlim_fr = -100
xlim_to = 300
#Graph
x = [i for i in range(xlim_fr , xlim_to + 1)]
y = [f(_x) for _x in x]
figsize = (10 , 5)
fig , ax = plt.subplots(1 , 1 , figsize=figsize)
ax.set_title('Outline of f')
ax.grid()
ax.plot(x , y)
ax.set_xlim(xlim_fr , xlim_to)
plt.show()
From the graph, within the definition area, It seems that x has a minimum value around 270. See if Bayesian optimization gives you that point.
#Initial data
init_X = [i for i in range(-50 , 300 , 50)]
init_X_np = np.array(init_X).reshape((len(init_X) , 1))
init_Y = [f(i) for i in init_X]
init_Y_np = np.array(init_Y).reshape((len(init_Y) , 1))
print(len(init_X))
print(len(init_Y))
print(init_X_np[:5])
print(init_Y_np[:5])
#Plot the position of the initial data
figsize = (10 , 5)
fig , ax = plt.subplots(1 , 1 , figsize=figsize)
ax.set_title('Outline of f and Initial Data')
ax.grid()
ax.plot(x , y , label="f" , color="y")
#Initial data
for init_x , init_y in zip(init_X , init_Y):
ax.plot(init_x , init_y , marker="o" , color="r")
ax.set_xlim(xlim_fr , xlim_to)
plt.show()
The red dot is given as the initial data.
The flow of optimization is as follows.
(1) Calculate the prediction (probability distribution) of the f value in the definition area from the acquired data. (2) Calculate the acquisition function value from each prediction. (3) Acquire the point (x) where the acquired function value is maximized as the next point, and search for it.
#Domain
bounds = [{'name': 'x', 'type': 'continuous', 'domain': (xlim_fr,xlim_to)}]
# X , Y :Initial data
# initial_design_numdata :The number of initial data to set. X above,No setting is required when Y is specified.
# normalize_Y :Objective function(Gaussian process)True to standardize.(This time False to make it easier to compare the forecast with the true value)
myBopt = GPyOpt.methods.BayesianOptimization(f=f
, domain=bounds
, X=init_X_np
, Y=init_Y_np
, normalize_Y=False
, maximize=False
#, initial_design_numdata=50
, acquisition_type='EI')
myBopt.run_optimization(max_iter=10)
#Optimal solution obtained
x_opt = myBopt.x_opt
fx_opt = myBopt.fx_opt
print("x_opt" , ":" , x_opt)
print("fx_opt" , ":" , fx_opt)
#Optimization trajectory
print("X.shape" , ":" , myBopt.X.shape)
print("Y.shape" , ":" , myBopt.Y.shape)
print("-" * 50)
print("X[:5]" , ":")
print(myBopt.X[:5])
print("-" * 50)
print("Y[:5]" , ":")
print(myBopt.Y[:5])
#Gaussian process regression model
model = myBopt.model.model
#Prediction (first component: mean, second component: std)
np_x = np.array(x).reshape(len(x) , 1)
pred = model.predict(np_x)
model.plot()
myBopt.plot_acquisition()
model.plot()
plt.plot(x , y , label="f" , color="y")
plt.plot(x_opt , fx_opt , label="Optimal solution" , marker="o" , color="r")
plt.xlim(xlim_fr , xlim_to)
plt.legend()
plt.grid()
plt.title("f of True & Predict")
plt.show()
The red dot in the third graph is the optimal solution obtained. Looking at the graph, the minimum value was found properly.
(Aside from detailed errors) Bayesian optimization was able to actually find the minimum value.
However, this time as initial data, It has a value close to the optimum solution (minimum value) of $ x = 250 $. That may have made it easier to optimize.
If the initial data is "only data away from the optimal solution" You may need to increase the number of trials.
Recommended Posts