2020.09.22 I wrote an article that follows this article. Abort optimization with early stopper of scikit-optimize
Since I sometimes optimize parameters at work and often receive consultations on optimization problems at work, I will summarize the gp_minimize
of scikit-optimize
, which makes Bayesian optimization very easy. I think this.
Easy installation with pip
pip install scikit-optimize
The full source code for this time. Here, only the key points, in fact, there is a code for drawing.
import numpy as np
from skopt import gp_minimize
def func(param=None):
ret = np.cos(param[0] + 2.34) + np.cos(param[1] - 0.78)
return -ret
if __name__ == '__main__':
x1 = (-np.pi, np.pi)
x2 = (-np.pi, np.pi)
x = (x1, x2)
result = gp_minimize(func, x,
n_calls=30,
noise=0.0,
model_queue_size=1,
verbose=True)
import numpy as np
from skopt import gp_minimize
Only numpy and gp_minimize. If it is diverted, change it as necessary.
def func(param=None):
ret = np.cos(param[0] + 0.34) + np.cos(param[1] - 0.78)
return -ret
This time, the problem was to receive two inputs and find the input value that maximizes the sum of the cosine function values of each. Because it's easy to understand. However, since the trick is too clumsy, I added an offset (0.34, -0.78) to the input value. If this offset is finally offset (-0.34, 0.78), it can be said that the optimization is successful.
As the name gp_minimize
suggests, it can only be minimized **, so the return value is set to a negative value by adding-to maximize the minimization.
x1 = (-np.pi, np.pi)
x2 = (-np.pi, np.pi)
x = (x1, x2)
Since we know that it is a cosine function this time, the space of the two input variables is the same, and the range that (-π to π) can take. It is necessary to specify the minimum and maximum that can be taken for each variable with List or Tuple, and finally pass them to one List or Tuple, so they are combined into one at the end.
result = gp_minimize(func, x,
n_calls=30,
noise=0.0,
model_queue_size=1,
verbose=True)
With this alone, sampling starts. For the time being, only the parameters that are likely to be used are specified this time.
The first func
and x
specify the optimization function and search space.
n_calls
is the number of samplings.
If noise
is not specified, Gaussian noise will be included in the search, so change it if necessary. This time, it is evaluated at 0.0 without noise.
model_queue_size
is gp_minimize
is using GaussianProcessRegressor
of scikit-learn
to predict the evaluation value of space and search for the next sampling point, so the number to hold an instance of GaussianProcessRegressor
each time. ..
If you do not specify it, all will be left, so it will eat up memory more and more. This time, I specified 1 to always keep only the latest version.
verbose
only has standard output during sampling. If False is specified, nothing will be output.
The result of this optimization is returned in result
.
If you refer to the contents with the dir function, the following members exist.
In [1]:dir(result)
Out[1]:
['fun',
'func_vals',
'models',
'random_state',
'space',
'specs',
'x',
'x_iters']
In general, the following interpretation seems to be sufficient.
member | value | Interpretation |
---|---|---|
fun | -1.9999999997437237 | Best evaluation value during optimization |
func_vals | array([ 0.72436992, -0.2934671 ,・ ・ ・ ・ ・ ・ ・-1.99988708, -1.99654543]) | Return value (evaluation value) of the function each time during optimization |
models | list(GaussianProcessRegressor) | Retained as explained aboveGaussianProcessRegressor List of instances of |
random_state | RandomState(MT19937) at 0x22023222268 | Random seed |
space | Space([ Real(low=-3.141592653589793, high=3.141592653589793, prior='uniform', transform='normalize'), Real(low=-3.141592653589793, high=3.141592653589793, prior='uniform', transform='normalize')]) |
Objects in search space |
specs | (Omitted because there are many) Contains Dictionary |
It seems that the optimization specifications are included together |
x | [-2.3399919472084387, 0.7798573940377893] | Optimized input variable values |
x_iters | (Omitted because there are many) List |
Contains what values were sampled each time |
Finally, using the value contained in result
, paste the plotted one.
It's a little hard to see, but the left figure is the actual heat map, the vertical axis is x1
and the horizontal axis is x2
, so the vertical axis is the lower one and the horizontal axis is the minimum value to the right ( Since the positive and negative are reversed, there is actually the maximum value).
The figure on the right is the result of optimization, but the color of the heat map itself is also made with the predicted value of GaussianProcessRegressor
.
By sampling 30 times, we were able to create something that was almost the same as the real space, and we were able to find the optimum value.
The marker of ○ is a sampling point for each time, which is difficult to understand, but it seems that the edges are sampled and the area near the minimum value is focused on after the space can be predicted to some extent.
The optimum value for which the ☆ marker was finally found.
Although there are few articles on how to use scikit-optimize, I wrote this this time because I couldn't find any Japanese articles that explain how to use Callback. So, I will write a separate article on how to use Callback and put a link here.
2020.09.22 I wrote it. Abort optimization with early stopper of scikit-optimize
Recommended Posts