"Theano" is a framework of deep learning, but honestly it is quite difficult. In learning, I have been working with reference to the original Tutorial and Japanese commentary (also in Qiita), but it is difficult to understand. Here, we will check the basics of Theano again while moving a small code. (My environment at the moment is python 2.7.8, theano 0.7.0.)
In Theano, so-called variables are not directly manipulated, but relationships are described by symbols and deposited in the processing system, and input / output is performed after the processing system processes them as necessary. The fact that the automatic differentiation of the equation is included in this "processing" is a major feature of Theano.
First, let's use a normal symbolic variable.
import theano
import theano.tensor as T
a = T.dscalar('a')
b = T.dscalar('b')
c = a + 2 * b
f_1 = theano.function([a,b], c)
If you input up to this point and execute it, you will feel that the HDD is making a rattling noise and generating an intermediate file. After that, the defined function is executed.
>>> f_1(2,3)
>>> array(8.0)
So far, how to use Theano symbolic variables. Excerpts from the Theano documentation and a list of various variable types.
Theano Variables
Variable type | Variables available |
---|---|
byte | bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4 |
16-bit integers | wscalar, wvector, wmatrix, wrow, wcol, wtensor3, wtensor4 |
32-bit integers | iscalar, ivector, imatrix, irow, icol, itensor3, itensor4 |
64-bit integers | lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4 |
float | fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4 |
double | dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4 |
complex | cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4 |
As mentioned above, the first character of the variables% scalar and% vector indicates the bit length of the variable. If this is omitted, the default value "float" and the bit length (variable type, dtype) will be floatX type (type that can be set by configuration).
Another thing to remember is ** shared variable **. Symbolic variables are rather "closed" variables in Theano, while shared variables are variables that are referenced by multiple functions and are used for things that are updated each time, such as learning parameters. Let's use this shared variable.
w = theano.shared(np.zeros(10), name='w')
print w.get_value()
As mentioned above, it is defined by theano.shared (initial value, symbol name in Theano). Also, unlike ordinary symbolic variables, shared variables can be retrieved with "get_value ()". (Conversely, in order to retrieve the value of a symbolic variable that is not a shared variable, it is necessary to prepare a function for that purpose.)
>>> print a # Theano Symbol
>>> a # I cannot see it
>>> print b # Theano Symbol
>>> b # I cannot see it...
>>> print w.get_value() # Theano Shared Variable
>>>[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
When declaring a shared variable in the constructor, there is an option to specify'borrow'.
s_default = theano.shared(np_array) #The default is borrow=False
s_false = theano.shared(np_array, borrow=False)
s_true = theano.shared(np_array, borrow=True)
This is an option that does not make a copy when creating a shared variable. (The original numpy object is destroyed.) (I don't understand it yet, so click here](http://deeplearning.net/software/theano/tutorial/aliasing.html#borrowing-when-creating-shared See -variables).)
The most important part of Theano. First, I will quote some writing styles. (The four sentences are irrelevant.)
>>> f = theano.function([x], 2*x)
>>> predict = theano.function(inputs=[x], outputs=prediction,
allow_input_downcast=True)
>>> linear_mix = theano.function([A, x, theano.Param(b, default=b_default)], [y, z])
>>> train = theano.function(
inputs=[x,y],
outputs=[prediction, xent],
updates=((w, w - 0.1 * gw), (b, b - 0.1 * gb)),
allow_input_downcast=True)
As mentioned above, theano.function () is described for a long time depending on how to add options.
The shortest format is as follows.
f = theano.function("input", "output")
So, List and tuple will be used for multiple inputs and outputs. (It seems that even one scalar type variable needs to be in list format to specify the input.) The following is a partial reprint of the explanation of the document.
function.function(inputs, outputs, mode=None, updates=None, givens=None, no_default_updates=False, accept_inplace=False, name=None, rebuild_strict=True, allow_input_downcast=None, profile=None, on_unused_input='raise')
Parameters -** params : Input values (required, inputs) - outputs **: Output values (required)
** updates ** are often used to update parameters as done in optimization calculations. Below is an example of using updates.
a = T.dscalar('a')
w2 = theano.shared(0.0, name='w2')
c = a + 2 * b + w2
f_3 = theano.function([a, b], c, updates=({w2: w2 + 1}))
#Or f_3 = theano.function([a,b],c, updates=[(w2, w2 + 1)])
for i in range(5):
print f_3(2, 3)
>>>
8.0
9.0
10.0
11.0
12.0
You can see that the contents of updates are reflected every time the function is called.
** gives ** is used to assign a concrete numerical value to a variable. In particular, it is used when assigning shared variables to theano functions. (If you give a shared variable with ** inputs **, an error will occur.)
w1 = T.dscalar('w1')
c = a + w1 * b
f_2a = theano.function([a, b], c, givens=[(w1, -2.0)])
print f_2a(2, 3)
>>> -4.0 # OK, c=a + w1 *to w1 of b-2.0 is assigned.
#If you specify a shared variable with inputs, an error will occur.
w2 = theano.shared(-2.0, name='w2') #w2 is a shared variable
f_2b = theano.function([a, b, w2], c) #The first argument is inputs
---------------------------------------------------------------------------
. . .
TypeError: Cannot use a shared variable (w2) as explicit input. Consider substituting a non-shared variable via the `givens` parameter
** allow_input_downcast ** is used to relax type management and avoid errors in situations where strict variable type management in Theano causes an error.
I would like to try to search for the minimum value of a function with the functions we have seen so far. As an example, with the function $ y = (x-1) ^ 4 $, I executed a code that monitors the amount of change in y while changing x and performs iterative calculation until the change falls below a predetermined threshold. (Since no differential value is used, it is not a gradient method.)
x_init = -1.0
x = theano.shared(x_init, name='x')
y = T.dscalar('y')
y = (x - 1.) ** 4
#Function f_4()Is defined. The increment value of x is specified by update.
f_4 = theano.function([], y, updates=({x: x + 0.01}))
# into loop
iter = 1000
y_prev = (x_init -1.1) ** 4
eps = 1.e-6
for i in range(iter):
y_update = f_4() ####In Loop, f_4()call.
y_delta = np.abs(y_update - y_prev)
if y_delta < eps:
x_min = x.get_value()
break
y_prev = y_update
print 'x_min = ', x_min
>>> x_min = 0.98999995552
As expected, we have calculated x_min, which gives the minimum value of the function $ y = (x-1) ^ 4 $. I haven't explicitly entered x for theano function f_4, but I can see that the ** updates ** option is working properly.
Finally, it is T.grad () which is a feature of theano.
x = T.dscalar('x')
y = (x - 4) * (x ** 2 * 2+ 6) #Differentiated formula
#Differentiate y with respect to x
gy = T.grad(cost=y, wrt=x)
#Define a function to find the derivative,input:x,output: gy
f = theano.function(inputs=[x], outputs=gy)
print theano.pp(f.maker.fgraph.outputs[0])
#
print f(0)
print f(1)
print f(2)
>>>
((TensorConstant{1.0} * (((x ** TensorConstant{2}) * TensorConstant{2}) + TensorConstant{6})) + ((((TensorConstant{1.0} * (x - TensorConstant{4})) * TensorConstant{2}) * TensorConstant{2}) * (x ** TensorConstant{1})))
6.0
-4.0
-2.0
As mentioned above, the result of differentiating the mathematical formula given by print theano.pp (...) can be displayed. The y derivative value at each x (= 0, 1, 2) can also be calculated. For the time being, check the parameters that can be specified from the document.
theano.gradient.grad(cost, wrt, consider_constant=None, disconnected_inputs='raise', add_names=True, known_grads=None, return_disconnected='zero', null_gradients='raise')
Parameters: -** cost : Formula to be differentiated (required) - wrt **: Derivative coefficient (required)
cost and wrt are required parameters.
Now that I have a better understanding of the necessary parts, I will implement the gradient descent method. The target function is the Rosenbrock function, which is (likely) often used in benchmarking algorithms such as the gradient method.
Fig. Rosenbrock Function
Although it is difficult to see in the above figure, it has a non-linearity that there is a groove in the annulus and Z suddenly rises when it deviates from the peripheral part. (There is a beautiful figure on Wikipedia, so if you are interested, please refer to it.)
The formula is as follows.
f(x, y) = (a - x)^2 + b * (y - x^2)^2
\ \ \ \\usually \ a\ = 1,\ and\ b\ =100
This function has a global minimum of $ f = 0.0 $ with $ (x, y) = (a, a ^ 2) $. The following code was created and executed.
import numpy as np
import theano
import theano.tensor as T
# Prep. variables and function
x_init = np.array([2.0, 2.0])
x = theano.shared(x_init, name='x')
a, b = (1., 100.)
# z_rb = (a - x) ** 2 + b * (y - x **2) **2
z_rb = (a - x[0]) ** 2 + b * (x[1] - x[0] **2) **2
dx = T.grad(cost=z_rb, wrt=x)
# Compile
train = theano.function(
inputs=[],
outputs=[z_rb],
updates=[(x, x-0.001 *dx)]
)
# Train
steps = 10000
print '(x,y)_init = (%9.3f, %9.3f)' % (x_init[0], x_init[1])
for i in range(steps):
z_tmp = train()
x_fin = x.get_value()
print '(x,y)_final= (%9.3f, %9.3f)' % (x_fin[0], x_fin[1])
>>>
(x,y)_init = ( 2.000, 2.000)
(x,y)_final= ( 1.008, 1.016)
The parameters (x, y) are summarized in a vector x [] of length 2. Starting from the initial value x = [2.0, 2.0], we were able to obtain a value close to the theoretical solution (1, 1). The figure below plots the history of x [](x, y).
Fig. Rosenbrock Function, contour
It can be seen that it is first swung to the left from (2., 2.) and then falls into the annular groove. (At first, when I calculated with the initial value (3., 3.), it diverged wonderfully ... By the way, the code of Numpy + scipy.optimize without using Theano is the initial value (3. , 3.) was also able to find the convergent solution. It seems that it is related to the optimal solution search algorithm rather than whether or not Theano is used.)
Once you understand this, you will be able to see the movement of the reference code that appears in Theano Tutorial. (Actually, I created a simple program for logistic regression and tried it.) Recently, "Chainer" has become popular, and the number of people who want to "learn Theano" may be decreasing. However, the first overseas Deep Learning related article dealing with Theano will continue to appear, and PyMC3 of the MCMC implementation library is also based on Theano, so I think it is important to deepen the understanding of Theano. There is.
Recommended Posts