2020/1/27 Posted

0. Who is the target of this article

--People who have touched python and have a good execution environment --People who have touched pyTorch to some extent --People who want to know about automatic differentiation by backward in machine learning by pyTorch --People who want to know that pyTorch cannot be backwarded

1.First of all

Nowadays, the main research on machine learning is in the python language, because python has many libraries (called modules) for high-speed data analysis and calculation. Among them, this time we will use a module called ** pyTorch ** to explain how automatic differentiation is performed and what can and cannot be done.

However, please understand that this article is like your own memo, and that you want it to be used as a reference only, and that you may use incorrect expressions or phrases for the sake of brevity. I want you to do it.

Also, in this article, we will not actually learn using Network. If you are interested in it, please refer to the link below.

Thorough explanation of CNNs with pyTorch

2. Install pyTorch

If you are using pyTorch for the first time, you have to install it with cmd because pyTorch is not already installed in python. Jump to the link below, select the one in your environment with "QUICK START LOCALLY" at the bottom of the page, and enter the command that appears with cmd etc. (You should be able to copy and paste the command and execute it).

pytorch official website

3. Special types provided by pyTorch

Just as numpy has a type called ndarray, pyTorch has a type called "** Tensor type **". Like the ndarray type, it can perform matrix calculations and is quite similar to each other, but the Tensor type is superior in machine learning in that it can use the GPU. This is because machine learning requires a considerable amount of calculation and uses a GPU with a high calculation speed. In addition, the Tensor type can be differentiated very easily for updating machine learning parameters. The key to this article is how easy it is to do this.

Please refer to the following Link for Tensor type operation and explanation.

What is the Tensor type of pyTorch

4. Automatic differentiation backward

4-1. Import of pyTorch

First, import so that you can use pyTorch. From here, write to a python file instead of cmd etc. Use module by writing the following code.

`filename.rb`


import torch

4-2. Example of automatic differentiation

The following simple calculation program is shown.

`filename.rb`


x = torch.tensor(4.0, requires_grad = True)
c = torch.tensor(8.0)
b = 5.0
y = c*x + b

print(y)

------------Output below---------------
tensor(37., grad_fn=<AddBackward0>)

This is a formula

y = 8x+5

It is a calculation when $ x = 4 $ of, and $ y $ is output as 37. "** grad_fn = \ <AddBackward0 > **" of this output is calculated by adding $ y $. It shows that it was calculated, and it is possible to differentiate by holding this in each variable.

This derivative is as follows.

`filename.rb`


y.backward()

This differentiates the values of all variables in $ y $.

Nothing is output, so if you check it

`filename.rb`


print(x)
print(x.grad)

------------Output below---------------
tensor(4., requires_grad=True)
tensor(8.)

In this way, the output of $ x $ does not give any differential information, but you can see the differential value 8.0 of the variable name by setting "** x.grad **".

Here, you said that you differentiated the values of all variables earlier, but when you actually look at the differential information of other variables

`filename.rb`


print(c.grad)
print(b.grad)

------------Output below---------------
None

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-881d89d572bd> in <module>
      1 print(c.grad)
----> 2 print(b.grad)

AttributeError: 'float' object has no attribute 'grad'

The first output is "** None ". Actually, when the first variable was prepared, the variable ** c ** was not added with " requires_grad = True **". This causes the variable ** c ** to try to differentiate but is interpreted as just a constant.

In addition, the second output has an error statement. This is an error caused by trying to perform a differential calculation that can only be done with the Tensor type, which is a special type of pyTorch, to something other than the Tensor type (this variable ** b ** is just a float type).

This shows that the Tensor type of pyTorch is very good, and if you set "requires_grad = True", all the differential information will be calculated in just one line.

4-3. A little more example of automatic differentiation

Here is an example of doing more complicated calculations.

`filename.rb`


x = torch.ones(2,3, requires_grad = True)
c = torch.ones(2,3, requires_grad = True)
y = torch.exp(x)*(c*3) + torch.exp(x)

print(torch.exp(x))
print(c*3)
print(y)

------------Output below---------------
tensor([[2.7183, 2.7183, 2.7183],
        [2.7183, 2.7183, 2.7183]], grad_fn=<ExpBackward>)
tensor([[3., 3., 3.],
        [3., 3., 3.]], grad_fn=<MulBackward0>)
tensor([[10.8731, 10.8731, 10.8731],
        [10.8731, 10.8731, 10.8731]], grad_fn=<AddBackward0>)

First, "** torch.exp () **" calculates $ e ^ {element} $ for each element of the argument data. Each output is as you can see, this time we applied "requires_grad = True" to both variables ** x ** and ** c **.

Now, when actually backward, it becomes as follows.

`filename.rb`


y.backward()

------------Output below---------------
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-11-ab75bb780f4c> in <module>
----> 1 y.backward()

              ......(abridgement)......

RuntimeError: grad can be implicitly created only for scalar outputs

An error is output. As written in this error, backward can actually only be done for scalar values (simply speaking, data with only one value that is not a matrix or vector).

The actual solution is as follows.

`filename.rb`


s = torch.sum(y)

print(s)
------------Output below---------------
tensor(65.2388, grad_fn=<SumBackward0>)

This "** torch.sum () **" returns the result of adding all the elements of the argument. Now you have a scalar value. When you actually do backward

`filename.rb`


s.backward()
print(x.grad)
print(c.grad)

------------Output below---------------
tensor([[10.8731, 10.8731, 10.8731],
        [10.8731, 10.8731, 10.8731]])
tensor([[8.1548, 8.1548, 8.1548],
        [8.1548, 8.1548, 8.1548]])

In this way, it is multivariable and the differentiation is firmly performed even for the matrix.

5. Example of not being able to perform automatic differentiation backward

From here, I will write an example that is not actually backwarded. From here onward, I will add new examples as soon as I find them or receive reports.

5-1. Example where the variable is not of type Tensor

As explained in Example 5-2 of automatic differentiation above, it occurs when the variable to be differentiated is ** Tensor type ** and "** requires_grad = True **". The solution is simple and the type should meet the requirements.

5-2. Example where the final output is not a scalar value

As explained in Example 5-3 of automatic differentiation above, it occurs when the variable you want to differentiate is ** not a scalar value **. The solution is to make it a scalar value somehow. For example, the sum of the elements done in the above example can be done without breaking the shape of the matrix.

5-3. Example of overwriting the variable you want to differentiate

An example is shown below.

`filename.rb`


x = torch.tensor(1.0, requires_grad = True)
x = torch.exp(x)
c = torch.tensor(1.0, requires_grad = True)
c = c*3
b = 5.0
y = c*x + b

print(y)

------------Output below---------------
tensor(13.1548, grad_fn=<AddBackward0>)

If you write it in a formula with a very simple example

y = (c*3)*e^{x}+5

$ C = 1 $, $ x = 1 $, and "requires_grad = True" makes both c and x differentiable from each other. The actual differential value is as follows.

`filename.rb`


y.backward()
print(x.grad)
print(c.grad)

------------Output below---------------
None
None

How, neither x nor c has a derivative value. This is because overwriting variables eliminates the calculation process of x and c (called a calculation graph) (the first definition of x and c, and all past calculations that are not done here are ". It will be overwritten by "torch.exp ()" and "* 3"). However, in such an example, if you try to put it in the optimizer (SGD etc.) prepared by torch, you will get an error. An actual example is shown below.

`filename.rb`


import torch.optim as optim
op = optim.SGD([x,c], lr=1.0)

------------Output below---------------
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-775027da6d38> in <module>
----> 1 op = optim.SGD([x,c], lr=1.0)

         ............(abridgement)............

ValueError: can't optimize a non-leaf Tensor

An error is output when using SGD which is an optimizer. If you are interested in the detailed explanation of this optimizer, please refer to the link below.

PyTorch optimizer SGD thorough explanation

As I will explain briefly here, this SGD class is preparing to update each parameter of the argument parameter "** [x, c] **" using the gradient information. However, at this point, it gives an error that the calculation graph of these variables is cut off.

The solution is to assign it to another variable without overwriting it, or write the expression directly. Since you can see that it is assigned to another variable, an example of writing the expression directly is shown below.

`filename.rb`


x = torch.tensor(1.0, requires_grad = True)
c = torch.tensor(1.0, requires_grad = True)
b = 5.0
y = c*3*torch.exp(x)
y = y + b
y.backward()

print(x.grad)
print(c.grad)

------------Output below---------------
tensor(8.1548)
tensor(8.1548)

Here, the operation of ** y ** is intentionally separated on the 4th and 5th lines. Actually, there is no penalty for overwriting this ** y **. Because ** y ** is not the variable you want to differentiate, it doesn't matter if the calculations are done properly.

5-4. Example of using root (square root)

First, consider the following calculation.

y = c\sqrt{x_1^2+x_2^2+x_3^2}

As you can see at a glance, this is the ** L2 norm ** (or just the distance) of the vector $ [x_1, x_2, x_3] $ multiplied by c.

This is shown programmatically below.

`filename.rb`


x = torch.tensor([2.0,5.0,3.0], requires_grad = True)
c = torch.tensor(2.0)
y = torch.sqrt(torch.sum(x**2))
y = y*c
y.backward()
print(x.grad)

------------Output below---------------
tensor([0.6489, 1.6222, 0.9733])

It can be seen that the differential value of each element related to the vector ** x ** can be calculated properly. To explain the program a little, the third line "** torch.sqrt (torch.sum (x \ * \ * 2)) **" first squares each element of x and sums each element. , And put it in the root.

Now consider the following example with this equation.

`filename.rb`


x = torch.tensor([0.0,0.0,0.0], requires_grad = True)
c = torch.tensor(2.0)
y = torch.sqrt(torch.sum(x**2))
y = y*c
y.backward()
print(x.grad)

------------Output below---------------
tensor([nan, nan, nan])

Now, I rewrote all the values of each element of the variable ** x ** to 0.0 (the distance of the vector x is 0). As a result, all the differential values became ** nan **. By doing this, the differential value of each element naturally takes ∞. Because the derivative of the above equation is

\frac{\partial y}{\partial x_1} = c\frac{x_1}{\sqrt{x_1^2+x_2^2+x_3^2}}

This is because the distance of $ x is 0 $, so division by zero is performed. When actually doing machine learning, the parameters are updated automatically, but if the value of the parameter becomes 0 even once in the process, if there is $ \ sqrt {x} $ in the calculation process, this The phenomenon will happen. This causes loss to diverge or become nan invisible.

The solution is as follows.

`filename.rb`


x = torch.tensor([0.0,0.0,0.0], requires_grad = True)
c = torch.tensor(2.0)
y = torch.norm(x)
y = y*c
y.backward()
print(x.grad)

------------Output below---------------
tensor([0., 0., 0.])

In this way, if you use "** torch.norm () **" on the third line, the differential value will not be ** nan ** but will be 0. This torch.norm () does exactly the same calculation itself, but it probably has a mechanism that prevents division by zero internally.

5-5. Example using in-place

Python has an in-place operation that allows you to do the following:

`filename.rb`


i += 1
x *=3

These are usually described by omitting the places where "** i = i + 1 " and " x = x * 3 **" are written. This notation seems to be faster, but it is not suitable for automatic differentiation. An example is shown below.

`filename.rb`


x = torch.tensor(3.0, requires_grad = True)
c = torch.tensor(2.0)
c += 2.0
x += 2.0
y = x + c

------------Output below---------------
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-94-beb1a427373d> in <module>
      2 c = torch.tensor(2.0)
      3 c += 2.0
----> 4 x += 2.0
      5 y = x + c

RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.

In-place operation is not possible for the variable with "requires_grad = True" in this way (of course, variable c has nothing to do with differentiation).

The solution is simple and you don't have to use in-place operation. That is, all you have to do is write it normally.

5-6. Example of using cpu and gpu at the same time

An example is shown below.

`filename.rb`


x = torch.tensor(3.0, requires_grad = True).cuda()
c = torch.tensor(2.0, requires_grad = True).cpu()
y = x*c
print(y)

------------Output below---------------
tensor(6., device='cuda:0', grad_fn=<MulBackward0>)

Here, the variable ** x ** uses gpu by ". Cuda () ", and the variable ** c ** uses cpu by ". Cpu () ". are doing. Also, both variables are in a differentiable state. The output of the answer uses gpu as " device ='cuda: 0'".

Let's do backward

`filename.rb`


y.backward()

------------Output below---------------
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-118-8117d53c0658> in <module>
----> 1 y.backward()
        ...........(abridgement).............

RuntimeError: Function MulBackward0 returned an invalid gradient at index 1 - expected type torch.FloatTensor but got torch.cuda.FloatTensor

This error occurs because the variables to be differentiated in this way use different resources.

The solution is to combine the resources used by each other. Of course, it is not necessary to prepare variables that are not related to differentiation.

5-7. torch. Example not Floattensor

Among the Tensor types provided by pyTorch, there are also int type, float type, double type and so on. You can use this type properly as follows.

`filename.rb`


a = torch.tensor(2)
b = torch.tensor(2.134)
c = torch.tensor(3.5)
c = c.type(torch.int32)
d = torch.tensor(3.1514, dtype = torch.float64)
print(a)
print(b)
print(c)
print(d)

------------Output below---------------
tensor(2)
tensor(2.1340)
tensor(3, dtype=torch.int32)
tensor(3.1514, dtype=torch.float64)

In this way, you can add "** dtype = " when declaring it, or add it as " xxxx.type **". Furthermore, the type of each variable is seen as follows.

`filename.rb`


print(a.dtype)
print(b.dtype)
print(c.dtype)
print(d.dtype)
------------Output below---------------
torch.int64
torch.float32
torch.int32
torch.float64

As you can see, if you pass an integer like variable a without specifying it at the time of declaration, it will automatically become ** int64 **, and if you pass a real number like variable b, it will automatically become ** float32 **. What's even more interesting is that the variable c is cast to int32, so the decimal part disappears.

Now, based on the above, the actual calculation process is shown below as an example.

`filename.rb`


x = torch.tensor(3.0, dtype = torch.int64, requires_grad = True)
c = torch.tensor(2.0, requires_grad = True)
y = x*c

------------Output below---------------
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-22-7183168e453f> in <module>
----> 1 x = torch.tensor(3.0, dtype = torch.int64, requires_grad = True)
      2 c = torch.tensor(2.0, requires_grad = True)
      3 y = x*c

RuntimeError: Only Tensors of floating point dtype can require gradients

Here, I tried to make the variable ** x ** an integer type. If it is an integer type, it seems that "** requires_grad = True **" cannot be set in the first place, and this error appears.

Let's rewrite it as float.

`filename.rb`


x = torch.tensor(3.0, dtype = torch.float64, requires_grad = True)
c = torch.tensor(2.0, requires_grad = True)
y = x*c
print(y)

------------Output below---------------
tensor(6., dtype=torch.float64, grad_fn=<MulBackward0>)

It worked fine. Let's do automatic differentiation.

`filename.rb`


y.backward()

------------Output below---------------
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-8-ab75bb780f4c> in <module>
----> 1 y.backward()
          ............(abridgement)............

RuntimeError: Function MulBackward0 returned an invalid gradient at index 1 - expected type torch.FloatTensor but got torch.DoubleTensor

I get an error. The reason for this is simple, in fact ** backward () can only be done with torch.float32 type **. Strictly speaking, ** torch.float64 ** prepared this time is treated as ** Double type **, so backward () cannot be done.

The solution is to just use ** torch.float32 ** instead of ** torch.float64 **.

5-8. Example using tensor type array (vector, matrix)

In actual machine learning, it is common to prepare and use bactors and matrices as parameters. An example is shown below.

`filename.rb`


x = torch.tensor([10.0,20.0,30.0], requires_grad = True)
c = torch.tensor([1.0,2.0,3.0], requires_grad = True)
x[0] = c[0]*x[0]
x[1] = c[1]*x[1]
x[2] = c[2]*x[2]
y = torch.sum(x)
print(y)

------------Output below---------------
tensor(140., grad_fn=<SumBackward0>)

This is a program that calculates the dot product of vector ** x ** and vector ** c **. Now, try backward ().

`filename.rb`


y.backward()

------------Output below---------------
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-11-ab75bb780f4c> in <module>
----> 1 y.backward()
        .........(abridgement)..........

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor []], which is output 0 of SelectBackward, is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

An error like this was output. The important thing here is that the error says "** gradient computation has been modified by an inplace operation **", and I gave an example of in-place earlier, but I can't find it anywhere in this program. ..

In fact, this calculation of the array "** x [0] = c [0] * x [0] **" is equivalent to in-place. If you look at it like this, it looks like an error like overwriting the variable mentioned above, but be careful because it says that the error is caused by in-place. The solution is to use the following program.

`filename.rb`


x = torch.tensor([10.0,20.0,30.0], requires_grad = True)
c = torch.tensor([1.0,2.0,3.0], requires_grad = True)
w = torch.zeros(3)

w[0] = c[0]*x[0]
w[1] = c[1]*x[1]
w[2] = c[2]*x[2]
y = torch.sum(w)
print(y)

------------Output below---------------
tensor(140., grad_fn=<SumBackward0>)

In this way, you can prepare variables that have nothing to do with differentiation. When I actually try backward ()

`filename.rb`


y.backward()
print(x.grad)

------------Output below---------------
tensor([1., 2., 3.])

It's working fine.

6. A word

This time, I have summarized the automatic part that is invisible in the backward of pyTorch and the example that can not be done. This article will continue to be updated as soon as we find such examples. I think there were many points that were difficult to read, but thank you for reading.

Summary of examples that cannot be pyTorch backward

0. Who is the target of this article

1.First of all

2. Install pyTorch

3. Special types provided by pyTorch

4. Automatic differentiation backward

4-1. Import of pyTorch

filename.rb

4-2. Example of automatic differentiation

filename.rb

filename.rb

filename.rb

filename.rb

4-3. A little more example of automatic differentiation

filename.rb

filename.rb

filename.rb

filename.rb

5. Example of not being able to perform automatic differentiation backward

5-1. Example where the variable is not of type Tensor

5-2. Example where the final output is not a scalar value

5-3. Example of overwriting the variable you want to differentiate

filename.rb

filename.rb

filename.rb

filename.rb

5-4. Example of using root (square root)

filename.rb

filename.rb

filename.rb

5-5. Example using in-place

filename.rb

filename.rb

5-6. Example of using cpu and gpu at the same time

filename.rb

filename.rb

5-7. torch. Example not Floattensor

filename.rb

filename.rb

filename.rb

filename.rb

filename.rb

5-8. Example using tensor type array (vector, matrix)

filename.rb

filename.rb

filename.rb

filename.rb

6. A word

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`

`filename.rb`