2020.09.22 I wrote an article that follows this article. Abort optimization with early stopper of scikit-optimize

Preface

Since I sometimes optimize parameters at work and often receive consultations on optimization problems at work, I will summarize the gp_minimize of scikit-optimize, which makes Bayesian optimization very easy. I think this.

scikit-optimize.gp_minimize

Installation

Easy installation with pip

pip install scikit-optimize

Full source code

The full source code for this time. Here, only the key points, in fact, there is a code for drawing.

import numpy as np
from skopt import gp_minimize

def func(param=None):
    ret = np.cos(param[0] + 2.34) + np.cos(param[1] - 0.78)
    return -ret

if __name__ == '__main__':
    x1 = (-np.pi, np.pi)
    x2 = (-np.pi, np.pi)
    x = (x1, x2)
    result = gp_minimize(func, x, 
                          n_calls=30,
                          noise=0.0,
                          model_queue_size=1,
                          verbose=True)

Around import

import numpy as np
from skopt import gp_minimize

Only numpy and gp_minimize. If it is diverted, change it as necessary.

Optimization function

def func(param=None):
    ret = np.cos(param[0] + 0.34) + np.cos(param[1] - 0.78)
    return -ret

This time, the problem was to receive two inputs and find the input value that maximizes the sum of the cosine function values of each. Because it's easy to understand. However, since the trick is too clumsy, I added an offset (0.34, -0.78) to the input value. If this offset is finally offset (-0.34, 0.78), it can be said that the optimization is successful.

As the name gp_minimize suggests, it can only be minimized **, so the return value is set to a negative value by adding-to maximize the minimization.

Search space settings

    x1 = (-np.pi, np.pi)
    x2 = (-np.pi, np.pi)
    x = (x1, x2)

Since we know that it is a cosine function this time, the space of the two input variables is the same, and the range that (-π to π) can take. It is necessary to specify the minimum and maximum that can be taken for each variable with List or Tuple, and finally pass them to one List or Tuple, so they are combined into one at the end.

optimisation

    result = gp_minimize(func, x, 
                          n_calls=30,
                          noise=0.0,
                          model_queue_size=1,
                          verbose=True)

With this alone, sampling starts. For the time being, only the parameters that are likely to be used are specified this time.

The first func and x specify the optimization function and search space. n_calls is the number of samplings. If noise is not specified, Gaussian noise will be included in the search, so change it if necessary. This time, it is evaluated at 0.0 without noise. model_queue_size is gp_minimize is using GaussianProcessRegressor of scikit-learn to predict the evaluation value of space and search for the next sampling point, so the number to hold an instance of GaussianProcessRegressor each time. .. If you do not specify it, all will be left, so it will eat up memory more and more. This time, I specified 1 to always keep only the latest version. verbose only has standard output during sampling. If False is specified, nothing will be output.

The result of this optimization is returned in result.

About the contents of result

If you refer to the contents with the dir function, the following members exist.

In [1]:dir(result)
Out[1]: 
['fun',
 'func_vals',
 'models',
 'random_state',
 'space',
 'specs',
 'x',
 'x_iters']

In general, the following interpretation seems to be sufficient.

member	value	Interpretation
fun	-1.9999999997437237	Best evaluation value during optimization
func_vals	array([ 0.72436992, -0.2934671 ,・・・・・・・-1.99988708, -1.99654543])	Return value (evaluation value) of the function each time during optimization
models	list(GaussianProcessRegressor)	Retained as explained above`GaussianProcessRegressor`List of instances of
random_state	RandomState(MT19937) at 0x22023222268	Random seed
space	Space([ Real(low=-3.141592653589793, high=3.141592653589793, prior='uniform', transform='normalize'), Real(low=-3.141592653589793, high=3.141592653589793, prior='uniform', transform='normalize')])	Objects in search space
specs	(Omitted because there are many) Contains Dictionary	It seems that the optimization specifications are included together
x	[-2.3399919472084387, 0.7798573940377893]	Optimized input variable values
x_iters	(Omitted because there are many) List	Contains what values were sampled each time

Summary of results

Finally, using the value contained in result, paste the plotted one. It's a little hard to see, but the left figure is the actual heat map, the vertical axis is x1 and the horizontal axis is x2, so the vertical axis is the lower one and the horizontal axis is the minimum value to the right ( Since the positive and negative are reversed, there is actually the maximum value).

The figure on the right is the result of optimization, but the color of the heat map itself is also made with the predicted value of GaussianProcessRegressor. By sampling 30 times, we were able to create something that was almost the same as the real space, and we were able to find the optimum value. The marker of ○ is a sampling point for each time, which is difficult to understand, but it seems that the edges are sampled and the area near the minimum value is focused on after the space can be predicted to some extent. The optimum value for which the ☆ marker was finally found.

How to use Callback

Although there are few articles on how to use scikit-optimize, I wrote this this time because I couldn't find any Japanese articles that explain how to use Callback. So, I will write a separate article on how to use Callback and put a link here.

2020.09.22 I wrote it. Abort optimization with early stopper of scikit-optimize

Bayesian optimization very easy with Python