2020/1/27 Posted
--People who have touched python and have a good execution environment --People who have touched pyTorch to some extent --People who want to know about automatic differentiation by backward in machine learning by pyTorch --People who want to know that pyTorch cannot be backwarded
Nowadays, the main research on machine learning is in the python language, because python has many libraries (called modules) for high-speed data analysis and calculation. Among them, this time we will use a module called ** pyTorch ** to explain how automatic differentiation is performed and what can and cannot be done.
However, please understand that this article is like your own memo, and that you want it to be used as a reference only, and that you may use incorrect expressions or phrases for the sake of brevity. I want you to do it.
Also, in this article, we will not actually learn using Network. If you are interested in it, please refer to the link below.
Thorough explanation of CNNs with pyTorch
If you are using pyTorch for the first time, you have to install it with cmd because pyTorch is not already installed in python. Jump to the link below, select the one in your environment with "QUICK START LOCALLY" at the bottom of the page, and enter the command that appears with cmd etc. (You should be able to copy and paste the command and execute it).
Just as numpy has a type called ndarray, pyTorch has a type called "** Tensor type **". Like the ndarray type, it can perform matrix calculations and is quite similar to each other, but the Tensor type is superior in machine learning in that it can use the GPU. This is because machine learning requires a considerable amount of calculation and uses a GPU with a high calculation speed. In addition, the Tensor type can be differentiated very easily for updating machine learning parameters. The key to this article is how easy it is to do this.
Please refer to the following Link for Tensor type operation and explanation.
What is the Tensor type of pyTorch
First, import so that you can use pyTorch. From here, write to a python file instead of cmd etc. Use module by writing the following code.
filename.rb
import torch
The following simple calculation program is shown.
filename.rb
x = torch.tensor(4.0, requires_grad = True)
c = torch.tensor(8.0)
b = 5.0
y = c*x + b
print(y)
------------Output below---------------
tensor(37., grad_fn=<AddBackward0>)
This is a formula
y = 8x+5
It is a calculation when $ x = 4 $ of, and $ y $ is output as 37. "** grad_fn = \ <AddBackward0 > **" of this output is calculated by adding $ y $. It shows that it was calculated, and it is possible to differentiate by holding this in each variable.
This derivative is as follows.
filename.rb
y.backward()
This differentiates the values of all variables in $ y $.
Nothing is output, so if you check it
filename.rb
print(x)
print(x.grad)
------------Output below---------------
tensor(4., requires_grad=True)
tensor(8.)
In this way, the output of $ x $ does not give any differential information, but you can see the differential value 8.0 of the variable name by setting "** x.grad **".
Here, you said that you differentiated the values of all variables earlier, but when you actually look at the differential information of other variables
filename.rb
print(c.grad)
print(b.grad)
------------Output below---------------
None
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-5-881d89d572bd> in <module>
1 print(c.grad)
----> 2 print(b.grad)
AttributeError: 'float' object has no attribute 'grad'
The first output is "** None ". Actually, when the first variable was prepared, the variable ** c ** was not added with " requires_grad = True **". This causes the variable ** c ** to try to differentiate but is interpreted as just a constant.
In addition, the second output has an error statement. This is an error caused by trying to perform a differential calculation that can only be done with the Tensor type, which is a special type of pyTorch, to something other than the Tensor type (this variable ** b ** is just a float type).
This shows that the Tensor type of pyTorch is very good, and if you set "requires_grad = True", all the differential information will be calculated in just one line.
Here is an example of doing more complicated calculations.
filename.rb
x = torch.ones(2,3, requires_grad = True)
c = torch.ones(2,3, requires_grad = True)
y = torch.exp(x)*(c*3) + torch.exp(x)
print(torch.exp(x))
print(c*3)
print(y)
------------Output below---------------
tensor([[2.7183, 2.7183, 2.7183],
[2.7183, 2.7183, 2.7183]], grad_fn=<ExpBackward>)
tensor([[3., 3., 3.],
[3., 3., 3.]], grad_fn=<MulBackward0>)
tensor([[10.8731, 10.8731, 10.8731],
[10.8731, 10.8731, 10.8731]], grad_fn=<AddBackward0>)
First, "** torch.exp () **" calculates $ e ^ {element} $ for each element of the argument data. Each output is as you can see, this time we applied "requires_grad = True" to both variables ** x ** and ** c **.
Now, when actually backward, it becomes as follows.
filename.rb
y.backward()
------------Output below---------------
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-11-ab75bb780f4c> in <module>
----> 1 y.backward()
......(abridgement)......
RuntimeError: grad can be implicitly created only for scalar outputs
An error is output. As written in this error, backward can actually only be done for scalar values (simply speaking, data with only one value that is not a matrix or vector).
The actual solution is as follows.
filename.rb
s = torch.sum(y)
print(s)
------------Output below---------------
tensor(65.2388, grad_fn=<SumBackward0>)
This "** torch.sum () **" returns the result of adding all the elements of the argument. Now you have a scalar value. When you actually do backward
filename.rb
s.backward()
print(x.grad)
print(c.grad)
------------Output below---------------
tensor([[10.8731, 10.8731, 10.8731],
[10.8731, 10.8731, 10.8731]])
tensor([[8.1548, 8.1548, 8.1548],
[8.1548, 8.1548, 8.1548]])
In this way, it is multivariable and the differentiation is firmly performed even for the matrix.
From here, I will write an example that is not actually backwarded. From here onward, I will add new examples as soon as I find them or receive reports.
As explained in Example 5-2 of automatic differentiation above, it occurs when the variable to be differentiated is ** Tensor type ** and "** requires_grad = True **". The solution is simple and the type should meet the requirements.
As explained in Example 5-3 of automatic differentiation above, it occurs when the variable you want to differentiate is ** not a scalar value **. The solution is to make it a scalar value somehow. For example, the sum of the elements done in the above example can be done without breaking the shape of the matrix.
An example is shown below.
filename.rb
x = torch.tensor(1.0, requires_grad = True)
x = torch.exp(x)
c = torch.tensor(1.0, requires_grad = True)
c = c*3
b = 5.0
y = c*x + b
print(y)
------------Output below---------------
tensor(13.1548, grad_fn=<AddBackward0>)
If you write it in a formula with a very simple example
y = (c*3)*e^{x}+5
$ C = 1 $, $ x = 1 $, and "requires_grad = True" makes both c and x differentiable from each other. The actual differential value is as follows.
filename.rb
y.backward()
print(x.grad)
print(c.grad)
------------Output below---------------
None
None
How, neither x nor c has a derivative value. This is because overwriting variables eliminates the calculation process of x and c (called a calculation graph) (the first definition of x and c, and all past calculations that are not done here are ". It will be overwritten by "torch.exp ()" and "* 3"). However, in such an example, if you try to put it in the optimizer (SGD etc.) prepared by torch, you will get an error. An actual example is shown below.
filename.rb
import torch.optim as optim
op = optim.SGD([x,c], lr=1.0)
------------Output below---------------
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-18-775027da6d38> in <module>
----> 1 op = optim.SGD([x,c], lr=1.0)
............(abridgement)............
ValueError: can't optimize a non-leaf Tensor
An error is output when using SGD which is an optimizer. If you are interested in the detailed explanation of this optimizer, please refer to the link below.
PyTorch optimizer SGD thorough explanation
As I will explain briefly here, this SGD class is preparing to update each parameter of the argument parameter "** [x, c] **" using the gradient information. However, at this point, it gives an error that the calculation graph of these variables is cut off.
The solution is to assign it to another variable without overwriting it, or write the expression directly. Since you can see that it is assigned to another variable, an example of writing the expression directly is shown below.
filename.rb
x = torch.tensor(1.0, requires_grad = True)
c = torch.tensor(1.0, requires_grad = True)
b = 5.0
y = c*3*torch.exp(x)
y = y + b
y.backward()
print(x.grad)
print(c.grad)
------------Output below---------------
tensor(8.1548)
tensor(8.1548)
Here, the operation of ** y ** is intentionally separated on the 4th and 5th lines. Actually, there is no penalty for overwriting this ** y **. Because ** y ** is not the variable you want to differentiate, it doesn't matter if the calculations are done properly.
First, consider the following calculation.
y = c\sqrt{x_1^2+x_2^2+x_3^2}
As you can see at a glance, this is the ** L2 norm ** (or just the distance) of the vector $ [x_1, x_2, x_3] $ multiplied by c.
This is shown programmatically below.
filename.rb
x = torch.tensor([2.0,5.0,3.0], requires_grad = True)
c = torch.tensor(2.0)
y = torch.sqrt(torch.sum(x**2))
y = y*c
y.backward()
print(x.grad)
------------Output below---------------
tensor([0.6489, 1.6222, 0.9733])
It can be seen that the differential value of each element related to the vector ** x ** can be calculated properly. To explain the program a little, the third line "** torch.sqrt (torch.sum (x \ * \ * 2)) **" first squares each element of x and sums each element. , And put it in the root.
Now consider the following example with this equation.
filename.rb
x = torch.tensor([0.0,0.0,0.0], requires_grad = True)
c = torch.tensor(2.0)
y = torch.sqrt(torch.sum(x**2))
y = y*c
y.backward()
print(x.grad)
------------Output below---------------
tensor([nan, nan, nan])
Now, I rewrote all the values of each element of the variable ** x ** to 0.0 (the distance of the vector x is 0). As a result, all the differential values became ** nan **. By doing this, the differential value of each element naturally takes ∞. Because the derivative of the above equation is
\frac{\partial y}{\partial x_1} = c\frac{x_1}{\sqrt{x_1^2+x_2^2+x_3^2}}
This is because the distance of $ x is 0 $, so division by zero is performed. When actually doing machine learning, the parameters are updated automatically, but if the value of the parameter becomes 0 even once in the process, if there is $ \ sqrt {x} $ in the calculation process, this The phenomenon will happen. This causes loss to diverge or become nan invisible.
The solution is as follows.
filename.rb
x = torch.tensor([0.0,0.0,0.0], requires_grad = True)
c = torch.tensor(2.0)
y = torch.norm(x)
y = y*c
y.backward()
print(x.grad)
------------Output below---------------
tensor([0., 0., 0.])
In this way, if you use "** torch.norm () **" on the third line, the differential value will not be ** nan ** but will be 0. This torch.norm () does exactly the same calculation itself, but it probably has a mechanism that prevents division by zero internally.
Python has an in-place operation that allows you to do the following:
filename.rb
i += 1
x *=3
These are usually described by omitting the places where "** i = i + 1 " and " x = x * 3 **" are written. This notation seems to be faster, but it is not suitable for automatic differentiation. An example is shown below.
filename.rb
x = torch.tensor(3.0, requires_grad = True)
c = torch.tensor(2.0)
c += 2.0
x += 2.0
y = x + c
------------Output below---------------
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-94-beb1a427373d> in <module>
2 c = torch.tensor(2.0)
3 c += 2.0
----> 4 x += 2.0
5 y = x + c
RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.
In-place operation is not possible for the variable with "requires_grad = True" in this way (of course, variable c has nothing to do with differentiation).
The solution is simple and you don't have to use in-place operation. That is, all you have to do is write it normally.
An example is shown below.
filename.rb
x = torch.tensor(3.0, requires_grad = True).cuda()
c = torch.tensor(2.0, requires_grad = True).cpu()
y = x*c
print(y)
------------Output below---------------
tensor(6., device='cuda:0', grad_fn=<MulBackward0>)
Here, the variable ** x ** uses gpu by ". Cuda () ", and the variable ** c ** uses cpu by ". Cpu () ". are doing. Also, both variables are in a differentiable state. The output of the answer uses gpu as " device ='cuda: 0'".
Let's do backward
filename.rb
y.backward()
------------Output below---------------
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-118-8117d53c0658> in <module>
----> 1 y.backward()
...........(abridgement).............
RuntimeError: Function MulBackward0 returned an invalid gradient at index 1 - expected type torch.FloatTensor but got torch.cuda.FloatTensor
This error occurs because the variables to be differentiated in this way use different resources.
The solution is to combine the resources used by each other. Of course, it is not necessary to prepare variables that are not related to differentiation.
Among the Tensor types provided by pyTorch, there are also int type, float type, double type and so on. You can use this type properly as follows.
filename.rb
a = torch.tensor(2)
b = torch.tensor(2.134)
c = torch.tensor(3.5)
c = c.type(torch.int32)
d = torch.tensor(3.1514, dtype = torch.float64)
print(a)
print(b)
print(c)
print(d)
------------Output below---------------
tensor(2)
tensor(2.1340)
tensor(3, dtype=torch.int32)
tensor(3.1514, dtype=torch.float64)
In this way, you can add "** dtype = " when declaring it, or add it as " xxxx.type **". Furthermore, the type of each variable is seen as follows.
filename.rb
print(a.dtype)
print(b.dtype)
print(c.dtype)
print(d.dtype)
------------Output below---------------
torch.int64
torch.float32
torch.int32
torch.float64
As you can see, if you pass an integer like variable a without specifying it at the time of declaration, it will automatically become ** int64 **, and if you pass a real number like variable b, it will automatically become ** float32 **. What's even more interesting is that the variable c is cast to int32, so the decimal part disappears.
Now, based on the above, the actual calculation process is shown below as an example.
filename.rb
x = torch.tensor(3.0, dtype = torch.int64, requires_grad = True)
c = torch.tensor(2.0, requires_grad = True)
y = x*c
------------Output below---------------
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-22-7183168e453f> in <module>
----> 1 x = torch.tensor(3.0, dtype = torch.int64, requires_grad = True)
2 c = torch.tensor(2.0, requires_grad = True)
3 y = x*c
RuntimeError: Only Tensors of floating point dtype can require gradients
Here, I tried to make the variable ** x ** an integer type. If it is an integer type, it seems that "** requires_grad = True **" cannot be set in the first place, and this error appears.
Let's rewrite it as float.
filename.rb
x = torch.tensor(3.0, dtype = torch.float64, requires_grad = True)
c = torch.tensor(2.0, requires_grad = True)
y = x*c
print(y)
------------Output below---------------
tensor(6., dtype=torch.float64, grad_fn=<MulBackward0>)
It worked fine. Let's do automatic differentiation.
filename.rb
y.backward()
------------Output below---------------
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-8-ab75bb780f4c> in <module>
----> 1 y.backward()
............(abridgement)............
RuntimeError: Function MulBackward0 returned an invalid gradient at index 1 - expected type torch.FloatTensor but got torch.DoubleTensor
I get an error. The reason for this is simple, in fact ** backward () can only be done with torch.float32 type **. Strictly speaking, ** torch.float64 ** prepared this time is treated as ** Double type **, so backward () cannot be done.
The solution is to just use ** torch.float32 ** instead of ** torch.float64 **.
In actual machine learning, it is common to prepare and use bactors and matrices as parameters. An example is shown below.
filename.rb
x = torch.tensor([10.0,20.0,30.0], requires_grad = True)
c = torch.tensor([1.0,2.0,3.0], requires_grad = True)
x[0] = c[0]*x[0]
x[1] = c[1]*x[1]
x[2] = c[2]*x[2]
y = torch.sum(x)
print(y)
------------Output below---------------
tensor(140., grad_fn=<SumBackward0>)
This is a program that calculates the dot product of vector ** x ** and vector ** c **. Now, try backward ().
filename.rb
y.backward()
------------Output below---------------
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-11-ab75bb780f4c> in <module>
----> 1 y.backward()
.........(abridgement)..........
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor []], which is output 0 of SelectBackward, is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
An error like this was output. The important thing here is that the error says "** gradient computation has been modified by an inplace operation **", and I gave an example of in-place earlier, but I can't find it anywhere in this program. ..
In fact, this calculation of the array "** x [0] = c [0] * x [0] **" is equivalent to in-place. If you look at it like this, it looks like an error like overwriting the variable mentioned above, but be careful because it says that the error is caused by in-place. The solution is to use the following program.
filename.rb
x = torch.tensor([10.0,20.0,30.0], requires_grad = True)
c = torch.tensor([1.0,2.0,3.0], requires_grad = True)
w = torch.zeros(3)
w[0] = c[0]*x[0]
w[1] = c[1]*x[1]
w[2] = c[2]*x[2]
y = torch.sum(w)
print(y)
------------Output below---------------
tensor(140., grad_fn=<SumBackward0>)
In this way, you can prepare variables that have nothing to do with differentiation. When I actually try backward ()
filename.rb
y.backward()
print(x.grad)
------------Output below---------------
tensor([1., 2., 3.])
It's working fine.
This time, I have summarized the automatic part that is invisible in the backward of pyTorch and the example that can not be done. This article will continue to be updated as soon as we find such examples. I think there were many points that were difficult to read, but thank you for reading.
Recommended Posts