Learn the basics of Theano once again

"Theano" is a framework of deep learning, but honestly it is quite difficult. In learning, I have been working with reference to the original Tutorial and Japanese commentary (also in Qiita), but it is difficult to understand. Here, we will check the basics of Theano again while moving a small code. (My environment at the moment is python 2.7.8, theano 0.7.0.)

How to use variables-symbolic and shared variables

In Theano, so-called variables are not directly manipulated, but relationships are described by symbols and deposited in the processing system, and input / output is performed after the processing system processes them as necessary. The fact that the automatic differentiation of the equation is included in this "processing" is a major feature of Theano.

First, let's use a normal symbolic variable.

import theano
import theano.tensor as T

a = T.dscalar('a')
b = T.dscalar('b')

c = a + 2 * b

f_1 = theano.function([a,b], c)

If you input up to this point and execute it, you will feel that the HDD is making a rattling noise and generating an intermediate file. After that, the defined function is executed.

>>> f_1(2,3)
>>> array(8.0)

So far, how to use Theano symbolic variables. Excerpts from the Theano documentation and a list of various variable types.

Theano Variables

Variable type Variables available
byte bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4
16-bit integers wscalar, wvector, wmatrix, wrow, wcol, wtensor3, wtensor4
32-bit integers iscalar, ivector, imatrix, irow, icol, itensor3, itensor4
64-bit integers lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4
float fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4
double dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4
complex cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4

As mentioned above, the first character of the variables% scalar and% vector indicates the bit length of the variable. If this is omitted, the default value "float" and the bit length (variable type, dtype) will be floatX type (type that can be set by configuration).

Another thing to remember is ** shared variable **. Symbolic variables are rather "closed" variables in Theano, while shared variables are variables that are referenced by multiple functions and are used for things that are updated each time, such as learning parameters. Let's use this shared variable.

w = theano.shared(np.zeros(10), name='w')
print w.get_value()

As mentioned above, it is defined by theano.shared (initial value, symbol name in Theano). Also, unlike ordinary symbolic variables, shared variables can be retrieved with "get_value ()". (Conversely, in order to retrieve the value of a symbolic variable that is not a shared variable, it is necessary to prepare a function for that purpose.)

>>> print a    # Theano Symbol
>>> a          # I cannot see it

>>> print b    # Theano Symbol
>>> b          # I cannot see it...

>>> print w.get_value()    # Theano Shared Variable
>>>[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]

When declaring a shared variable in the constructor, there is an option to specify'borrow'.

s_default = theano.shared(np_array)   #The default is borrow=False
s_false   = theano.shared(np_array, borrow=False)
s_true    = theano.shared(np_array, borrow=True)

This is an option that does not make a copy when creating a shared variable. (The original numpy object is destroyed.) (I don't understand it yet, so click here](http://deeplearning.net/software/theano/tutorial/aliasing.html#borrowing-when-creating-shared See -variables).)

Theano functions

The most important part of Theano. First, I will quote some writing styles. (The four sentences are irrelevant.)

>>> f = theano.function([x], 2*x)
>>> predict = theano.function(inputs=[x], outputs=prediction,
          allow_input_downcast=True)
>>> linear_mix = theano.function([A, x, theano.Param(b, default=b_default)], [y, z])
>>> train = theano.function(
          inputs=[x,y],
          outputs=[prediction, xent],
          updates=((w, w - 0.1 * gw), (b, b - 0.1 * gb)),
          allow_input_downcast=True)

As mentioned above, theano.function () is described for a long time depending on how to add options.

The shortest format is as follows.

f = theano.function("input", "output")

So, List and tuple will be used for multiple inputs and outputs. (It seems that even one scalar type variable needs to be in list format to specify the input.) The following is a partial reprint of the explanation of the document.

function.function(inputs, outputs, mode=None, updates=None, givens=None, no_default_updates=False, accept_inplace=False, name=None, rebuild_strict=True, allow_input_downcast=None, profile=None, on_unused_input='raise')

Parameters -** params : Input values (required, inputs) - outputs **: Output values (required)

** updates ** are often used to update parameters as done in optimization calculations. Below is an example of using updates.

a = T.dscalar('a')
w2 = theano.shared(0.0, name='w2')
c = a + 2 * b + w2

f_3 = theano.function([a, b], c, updates=({w2: w2 + 1}))
#Or f_3 = theano.function([a,b],c, updates=[(w2, w2 + 1)])

for i in range(5):
    print f_3(2, 3)

>>>
8.0
9.0
10.0
11.0
12.0

You can see that the contents of updates are reflected every time the function is called.

** gives ** is used to assign a concrete numerical value to a variable. In particular, it is used when assigning shared variables to theano functions. (If you give a shared variable with ** inputs **, an error will occur.)

w1 = T.dscalar('w1')
c = a + w1 * b
f_2a = theano.function([a, b], c, givens=[(w1, -2.0)])

print f_2a(2, 3)
>>> -4.0   # OK, c=a + w1 *to w1 of b-2.0 is assigned.

#If you specify a shared variable with inputs, an error will occur.
w2 = theano.shared(-2.0, name='w2')         #w2 is a shared variable
f_2b = theano.function([a, b, w2], c)       #The first argument is inputs
---------------------------------------------------------------------------
. . .
TypeError: Cannot use a shared variable (w2) as explicit input. Consider substituting a non-shared variable via the `givens` parameter

** allow_input_downcast ** is used to relax type management and avoid errors in situations where strict variable type management in Theano causes an error.

Finding the minimum value (without using T.grad ())

I would like to try to search for the minimum value of a function with the functions we have seen so far. As an example, with the function $ y = (x-1) ^ 4 $, I executed a code that monitors the amount of change in y while changing x and performs iterative calculation until the change falls below a predetermined threshold. (Since no differential value is used, it is not a gradient method.)

x_init = -1.0
x = theano.shared(x_init, name='x')   
y = T.dscalar('y')

y = (x - 1.) ** 4
#Function f_4()Is defined. The increment value of x is specified by update.
f_4 = theano.function([], y, updates=({x: x + 0.01}))

# into loop
iter = 1000
y_prev = (x_init -1.1) ** 4
eps = 1.e-6
for i in range(iter):
    y_update = f_4()      ####In Loop, f_4()call.
    y_delta = np.abs(y_update - y_prev)
    
    if y_delta < eps:
        x_min = x.get_value()
        break
    y_prev = y_update

print 'x_min = ', x_min

>>> x_min =  0.98999995552

As expected, we have calculated x_min, which gives the minimum value of the function $ y = (x-1) ^ 4 $. I haven't explicitly entered x for theano function f_4, but I can see that the ** updates ** option is working properly.

Try using automatic differentiation T.grad ()

Finally, it is T.grad () which is a feature of theano.

x = T.dscalar('x')
y = (x - 4) * (x ** 2 * 2+ 6)    #Differentiated formula

#Differentiate y with respect to x
gy = T.grad(cost=y, wrt=x)

#Define a function to find the derivative,input:x,output: gy
f = theano.function(inputs=[x], outputs=gy)
print theano.pp(f.maker.fgraph.outputs[0])
#
print f(0)
print f(1)
print f(2)
>>>
((TensorConstant{1.0} * (((x ** TensorConstant{2}) * TensorConstant{2}) + TensorConstant{6})) + ((((TensorConstant{1.0} * (x - TensorConstant{4})) * TensorConstant{2}) * TensorConstant{2}) * (x ** TensorConstant{1})))
6.0
-4.0
-2.0

As mentioned above, the result of differentiating the mathematical formula given by print theano.pp (...) can be displayed. The y derivative value at each x (= 0, 1, 2) can also be calculated. For the time being, check the parameters that can be specified from the document.

theano.gradient.grad(cost, wrt, consider_constant=None, disconnected_inputs='raise', add_names=True, known_grads=None, return_disconnected='zero', null_gradients='raise')

Parameters: -** cost : Formula to be differentiated (required) - wrt **: Derivative coefficient (required)

cost and wrt are required parameters.

Implement gradient descent

Now that I have a better understanding of the necessary parts, I will implement the gradient descent method. The target function is the Rosenbrock function, which is (likely) often used in benchmarking algorithms such as the gradient method.

Fig. Rosenbrock Function rosen_brock2_s.png

Although it is difficult to see in the above figure, it has a non-linearity that there is a groove in the annulus and Z suddenly rises when it deviates from the peripheral part. (There is a beautiful figure on Wikipedia, so if you are interested, please refer to it.)

The formula is as follows.

f(x, y) = (a - x)^2 + b * (y - x^2)^2
\ \ \ \\usually \ a\ = 1,\ and\ b\ =100

This function has a global minimum of $ f = 0.0 $ with $ (x, y) = (a, a ^ 2) $. The following code was created and executed.

import numpy as np
import theano
import theano.tensor as T

    # Prep. variables and function
    x_init = np.array([2.0, 2.0])
    x = theano.shared(x_init, name='x')
    
    a, b = (1., 100.)
    # z_rb = (a - x) ** 2 + b * (y - x **2) **2
    z_rb = (a - x[0]) ** 2 + b * (x[1] - x[0] **2) **2
    
    dx = T.grad(cost=z_rb, wrt=x)
    
    # Compile
    train = theano.function(
          inputs=[],
          outputs=[z_rb],
          updates=[(x, x-0.001 *dx)]
          )
    # Train
    steps = 10000
   
    print '(x,y)_init = (%9.3f, %9.3f)' % (x_init[0], x_init[1])
    for i in range(steps):
        z_tmp = train()

    x_fin = x.get_value()
    print '(x,y)_final= (%9.3f, %9.3f)' % (x_fin[0], x_fin[1])

>>>
(x,y)_init = (    2.000,     2.000)
(x,y)_final= (    1.008,     1.016)

The parameters (x, y) are summarized in a vector x [] of length 2. Starting from the initial value x = [2.0, 2.0], we were able to obtain a value close to the theoretical solution (1, 1). The figure below plots the history of x [](x, y).

Fig. Rosenbrock Function, contour rosenbrock_contour2.png

It can be seen that it is first swung to the left from (2., 2.) and then falls into the annular groove. (At first, when I calculated with the initial value (3., 3.), it diverged wonderfully ... By the way, the code of Numpy + scipy.optimize without using Theano is the initial value (3. , 3.) was also able to find the convergent solution. It seems that it is related to the optimal solution search algorithm rather than whether or not Theano is used.)

Once you understand this, you will be able to see the movement of the reference code that appears in Theano Tutorial. (Actually, I created a simple program for logistic regression and tried it.) Recently, "Chainer" has become popular, and the number of people who want to "learn Theano" may be decreasing. However, the first overseas Deep Learning related article dealing with Theano will continue to appear, and PyMC3 of the MCMC implementation library is also based on Theano, so I think it is important to deepen the understanding of Theano. There is.

References (web site)

Recommended Posts

Learn the basics of Theano once again
Learn the basics of Python ① Beginners
[Linux] Learn the basics of shell commands
Review of the basics of Python (FizzBuzz)
About the basics list of Python basics
[Python3] Understand the basics of Beautiful Soup
Learn the basics while touching python Variables
I didn't know the basics of Python
The basics of running NoxPlayer in Python
Let the machine "learn" the rules of FizzBuzz
Intuitively learn the reshape of Python np
[Python3] Understand the basics of file operations
Learn the basics of document classification by natural language processing, topic model
Basics of Python ①
Basics of python ①
Learn while implementing with Scipy Logistic regression and the basics of multi-layer perceptron
Learn Nim with Python (from the beginning of the year).
Let's break down the basics of TensorFlow Python code
I want to fully understand the basics of Bokeh
Learn the design pattern "Chain of Responsibility" in Python
Animate the basics of dynamic programming and the knapsack problem
How much do you know the basics of Python?
The beginning of cif2cell
The meaning of self
Basics of Python scraping basics
the zen of Python
The story of sys.path.append ()
# 4 [python] Basics of functions
Basics of network programs?
Basics of Perceptron Foundation
Basics of regression analysis
Basics of python: Output
Revenge of the Types: Revenge of types
What beginners learned from the basics of variables in python