I tried to fight the Local Minimum of Goldstein-Price Function

Previously, I learned that there is a RosenBrock function for Optimizer testing in the SciPy library, but when I searched Wikipedia, I found that there are various benchmarking functions other than RosenBrock.

https://en.wikipedia.org/wiki/Test_functions_for_optimization

The simplest form of Sphere Function is expected to be easy to intuitively find the optimal solution (minimum value point), but other functions may find it difficult to find the optimal solution. This time, I would like to take up "Goldstein Price Function" from this.

Fig. Sphere Function Sphere_func0.png (Reference figure, this is easy to handle ...)

Fig. Goldstein-Price Function Goldstein_price_func2.png (This time, we will deal with this.)

Goldstein-The nature of the Price Function

First, the formula of the function is as follows.

f(x,y) = (1+(x+y+1)^2 (19 -14x+3x^2-14y+6xy+3y^2))\\
(30 + (2x -3y)^2 (18 - 32x +12x^2 +48y -36xy+27y^2)

As shown in the 3D plot in the above figure, the nonlinearity of the function is large as the values on the z-axis are larger than the values on the x-axis and y-axis. The minimum value point (optimal solution) of this function is

f(0, -1) = 3

As shown, z = 3 is obtained by (x, y) = (0, -1), but there are multiple local minimums around this. I drew a contour map to see the situation in detail. GoldsteinPrice_contour1.PNG

There is a groove-like shape in a diagonal cross shape, and the intersection point of the groove is the Global Minumum point (x, y) = (0, -1). There is an island in a slightly larger area near (1, 0.2) on the upper right, and there is a possibility that it will take a Local Minimum. In addition, small Local Mimimum candidates are observed in the groove shape. This seems difficult to handle.

Solve with TensorFlow

Since we have various optimizers, we decided to solve it using TensorFlow. The code looks like this:

First is the definition of the Goldstein-Price function. It is coded according to the definition of the function, but the square calculation uses tf.square ().

def goldsteinprice_tf(x, y):
    # goal : (x, y) = (0., -1.0)
    term1 = (19. - 14. * x + 3. * x * x -14. * y + 6. * x * y + 3. * y * y)
    term2 = 1. + tf.square((x + y + 1.0)) * term1
    term3 = 18. -32. * x + 12. * x * x + 48. * y - 36. * x * y + 27. * y * y
    term4 = 30. + tf.square((2. * x - 3. * y)) * term3
    z = term2 * term4
    return z

Set the initial value and give the Goldstein-Price function as the cost.

    # set initial parameter
    x0, y0 = (0., 0.)
    x = tf.Variable(x0)
    y = tf.Variable(y0)
    loss = goldsteinprice_tf(x, y)

The optimizer can be selected from 6 types of TensorFlow Optimizer.

    lrate = 1.e-03    #Learning rate
    sw = 4            # Optimizer Selection
    
    # need to check sw = [2, 5]
    if sw == 1:
        optimizer = tf.train.GradientDescentOptimizer(lrate)
    elif sw == 2:
        optimizer = tf.train.AdagradOptimizer(lrate, initial_accumulator_value=0.1)
    elif sw == 3:
        optimizer = tf.train.MomentumOptimizer(lrate, momentum=0.0)
    elif sw == 4:
        optimizer = tf.train.AdamOptimizer(lrate)
    elif sw == 5:
        optimizer = tf.train.FtrlOptimizer(lrate)
    elif sw == 6:
        optimizer = tf.train.RMSPropOptimizer(lrate, decay=0.0)
    else:
        print('Error.')
    
    train_op = optimizer.minimize(loss)

All you have to do now is initialize the variables and run the session.

   init = tf.initialize_all_variables()

    with tf.Session() as sess:
        sess.run(init)
        print('Training...')
        print('initial (x, y) = (%8.4f, %8.4f) : loss = %8.4f' 
            % (sess.run(x), sess.run(y), sess.run(loss)))
        
        for i in range(10001):
            train_op.run()

            if i % 100 == 0:
                loss_ = float(sess.run(loss))
                # loss_log.append(loss_)
                x_ = sess.run(x)
                y_ = sess.run(y)
            
            if i % 1000 == 0:                # echo status on screen
                print('(x, y) = (%8.4f, %8.4f) : loss = %8.4f' 
                    % (x_, y_, loss_))

        # Check trained parameter
        print('final (x, y) = (%8.4f, %8.4f) : loss = %8.4f' 
            % (sess.run(x), sess.run(y), sess.run(loss)))

The calculation parameters that can be selected are --Initial value (x0, y0) --Type of Optimizer --Learning rate and number of Loop processes It becomes.

As a calculation example, the execution result of the above list settings (initial value (0,0), AdamOptimizer, Loop 10,000 times) is as follows.

Training...
initial (x, y) = (  0.0000,   0.0000) : loss = 600.0000
(x, y) = ( -0.0010,  -0.0010) : loss = 598.5597
(x, y) = ( -0.5198,  -0.4769) : loss =  31.8792
(x, y) = ( -0.5756,  -0.4230) : loss =  30.2262
(x, y) = ( -0.5987,  -0.4012) : loss =  30.0007
(x, y) = ( -0.6000,  -0.4000) : loss =  30.0000
(x, y) = ( -0.6000,  -0.4000) : loss =  30.0000
(x, y) = ( -0.6000,  -0.4000) : loss =  30.0000
(x, y) = ( -0.6000,  -0.4000) : loss =  30.0000
(x, y) = ( -0.6000,  -0.4000) : loss =  30.0000
(x, y) = ( -0.6000,  -0.4000) : loss =  30.0000
(x, y) = ( -0.6000,  -0.4000) : loss =  30.0000
final (x, y) = ( -0.6000,  -0.4000) : loss =  30.0000

The situation is brilliantly caught in the Local Minimum (-0.6, -0.4). By the way, when the initial value was set near the Global Minimum and calculated, it was confirmed that it properly converged to the Minimum point (0, -1).

Global Optimization method "simulated annealing"

Here, let's leave TensorFlow and try the Global Optimization method "Simulated Annealing". [Wikipedia](https://ja.wikipedia.org/wiki/%E7%84%BC%E3%81%8D%E3%81%AA%E3%81%BE%E3%81%97%E6%B3 The explanation in% 95) is as follows.

When the SA algorithm repeatedly finds a solution, it finds a solution in the random neighborhood of the current solution, which is affected by the value of the function given at that time and the global parameter T (meaning temperature). Then, the value of T (temperature) gradually decreases due to the similarity with the above-mentioned physical process. Therefore, since T is large at first, the solution changes boldly, but it converges as T approaches zero. At first, you can easily climb the slope, so you don't have to think about what to do if you fall into a local minimum that is a problem with hill climbing.

The wonderful thing is that the pseudo code was also posted, so I converted it to Python code and executed it.

The main part of the code is as follows.

def sim_anneal(startState, maxIter, goalE):
    state = startState
    x_, y_ = state[0], state[1]
    e = goldsteinprice(x_, y_)
    bestState = state
    bestE = e
    for iter in range(0, maxIter):
        delta = np.random.rand(2) - 0.5
        nextState = state + delta * 1.
        nextE = goldsteinprice(nextState[0], nextState[1])
        
        if bestE > nextE:
            bestState = nextState
            bestE = nextE
            if goalE >= bestE:
                return bestState
        r = np.random.rand()
        if probability(e, nextE, temperature(10.0, iter/maxIter)) >= r:
            state = nextState
            e = nextE
        if iter % 100 ==0:
            print('iter: nextE, bestE, goalE = (%5d, %9.3f, %9.3f, %9.3f)' 
               % (iter, nextE, bestE, goalE))
            
    return bestState

When the calculation is executed, it reaches the neighborhood of Global Munimum (0.0, -1.0).

Fig. Parameter(x,y) Path (Simulated Annealing) GoldsteinPrice_SA.png

In this calculation as well, there are various calculation parameter options in addition to the initial value. Also, what I noticed after running the calculation several times is that it is affected by the nature of random numbers. In some cases, if the seed of a random number was not set and the calculation was executed many times, the calculation would end in a completely different place. The results are difficult to read, so it is not a versatile calculation method.

Start over with TensorFlow Optimizer

To see the effect of the initial value, 9 initial values: [[0., 0.], [0., 1.], [1., 0.], [1., 1.], [0., -1.], [-1., 0. ], [-1., -1.], [-1., 1.], [1., -1.]] .

First, the calculation result using Adagrad Optimizer is shown in the figure below.

Fig. Parameter(x,y) path from 9 points (Adagrad) gsp_p1_Adagrad.png

None of them have reached the minimum point, except for the case where it was at Global Minimum from the beginning. (It may be improved by adjusting the learning rate and other parameters.) The result of using Adam Optimizer is shown below.

Fig. Parameter(x,y) path from 9 points (Adam) gsp_p1_Adam.png

Here, the minimum point is reached in two cases, but if one is excluded because the initial value was the minimum point, the case of (x0, y0) = (1.0, -1.0) is also successful.

From the above situation, I finally tried using random numbers to set the initial value. Normally, in neural network learning, random numbers are used to initialize the weights, and the purpose is to "promote the progress of learning by eliminating the symmetry of the net." This time, the meaning is slightly different because the random numbers are initialized in order to widely assign the parameters in consideration of the possibility of falling to the Local Minimum.

    num_jobs = 10

    with tf.Session() as sess:
        for job in range(num_jobs):
            init_param = tf.Variable(tf.random_uniform([2], minval=-1., maxval=1.))
            x = tf.slice(init_param, [0], [1])
            y = tf.slice(init_param, [1], [1])

            loss = goldsteinprice_tf(x, y)
(Omitted)

As mentioned above, tf.random_uniform () generated random numbers from -1.0 to 1.0 and used them as initial values. The calculation result is as follows.

Fig. Parameter(x,y) path from random point gsp_p2_Adam.png (I tried to separate the colors of Path.)

It was a little more pessimistic from the result of the initial value of 9 points, but it has reached (0, -1) of Global Minumum with a high probability. You can find the Golobal Minimum with a probability of 30% or more.

This time, I tried to "fight" with the Goldestein-Price function for no practical purpose, but I found various difficulties in finding the optimum point. By using a random number as the initial value, the problem of Local Minimum can be avoided to some extent, but what about the range of the random number? There seems to be no general answer to that.

However, assuming the situation where the learning data is input in the actual classification problem, there may be a situation where the training data itself has an error and is not caught by the Local Minimum by using the stochastic gradient descent method. I believe. (Of course, it is very useful to try various initial values in a situation where you can spend time.)

References (web site)

--Test functions for optimization (wikipedia) ... A beautiful function diagram (3D plot) is posted. https://en.wikipedia.org/wiki/Test_functions_for_optimization --Simulated Annealing (wikipesida) https://ja.wikipedia.org/wiki/%E7%84%BC%E3%81%8D%E3%81%AA%E3%81%BE%E3%81%97%E6%B3%95 --It was up to Tensorflow Document ... ver. 0.7.0. (As of February 22, 2016) https://www.tensorflow.org/ ――Learn the basics of Theano once again ―― Qiita http://qiita.com/TomokIshii/items/1f483e9d4bfeb05ae231

Recommended Posts

I tried to fight the Local Minimum of Goldstein-Price Function
I tried to get the index of the list using the enumerate function
I tried the pivot table function of pandas
I tried to correct the keystone of the image
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
I tried to summarize the basic form of GPLVM
I tried to approximate the sin function using chainer
I tried to visualize the spacha information of VTuber
I tried to erase the negative part of Meros
I tried to classify the voices of voice actors
I tried to summarize the string operations of Python
I tried to find the entropy of the image with python
[Horse Racing] I tried to quantify the strength of racehorses
I tried to get the location information of Odakyu Bus
I tried to find the average of the sequence with TensorFlow
I made a function to check the model of DCGAN
[Python] I tried to visualize the follow relationship of Twitter
I tried a little bit of the behavior of the zip function
[Machine learning] I tried to summarize the theory of Adaboost
I tried to launch ipython cluster to the minimum on AWS
I tried to approximate the sin function using chainer (re-challenge)
I tried to move the ball
I tried to estimate the interval.
I want to get the name of the function / method being executed
I tried to automate the watering of the planter with Raspberry Pi
I tried to build the SD boot image of LicheePi Nano
I tried to expand the size of the logical volume with LVM
I tried to improve the efficiency of daily work with Python
I tried to visualize the common condition of VTuber channel viewers
I tried the asynchronous server of Django 3.0
I tried to summarize the umask command
I tried to summarize the graphical modeling.
I tried to estimate the pi stochastically
I tried to touch the COTOHA API
I tried fitting the exponential function and logistics function to the number of COVID-19 positive patients in Tokyo
I tried to transform the face image using sparse_image_warp of TensorFlow Addons
I tried to notify Zabbix Server of execution error of AWS Lambda function
I tried to execute SQL from the local environment using Looker SDK
I tried to get the batting results of Hachinai using image processing
I tried to visualize the age group and rate distribution of Atcoder
I tried transcribing the news of the example business integration to Amazon Transcribe
zoom I tried to quantify the degree of excitement of the story at the meeting
I tried to estimate the similarity of the question intent using gensim's Doc2Vec
I tried how to improve the accuracy of my own Neural Network
I measured 6 methods to get the index of the maximum value (minimum value) of the list
I tried to get the authentication code of Qiita API with Python.
(Python) I tried to analyze 1 million hands ~ I tried to estimate the number of AA ~
I tried to summarize the logical way of thinking about object orientation.
I tried to find the optimal path of the dreamland by (quantum) annealing
I tried to verify and analyze the acceleration of Python by Cython
I tried to analyze the negativeness of Nono Morikubo. [Compare with Posipa]
I tried to visualize the text of the novel "Weathering with You" with WordCloud
[Linux] I tried to verify the secure confirmation method of FQDN (CentOS7)
I tried to get the RSS of the top song of the iTunes store automatically
I tried to get the movie information of TMDb API with Python
I tried to display the altitude value of DTM in a graph
I tried the common story of using Deep Learning to predict the Nikkei 225
Using COTOHA, I tried to follow the emotional course of Run, Melos!
I tried to verify the result of A / B test by chi-square test
Python: I want to measure the processing time of a function neatly