[Super Introduction to Machine Learning] Learn Pytorch tutorials

This is a compilation of what the author wrote instead of a memo, which the author has not yet grasped the whole picture. I will summarize the contents of the Pytorch tutorial + the contents examined in α.

This time is Chapter 1.

Introduction

Learn about installation and the basic layers of CNN.

When calculated in a large dimension such as a 32 × 32 matrix It's hard to imagine what kind of calculation is being done I calculated the actual code with a simple example and tried to include some formulas. We aim for an intuitive grasp.

1-Install

https://pytorch.org/

Select the environment of your PC from the above URL and execute the [Run this Command] command.

1-0. Try to touch for the first time

Let's output the matrix in the same way as numpy. https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py

test.py


x = torch.empty(5, 3)
print(x)
#Execution result
tensor([[1.9349e-19, 4.5445e+30, 4.7429e+30],
        [7.1354e+31, 7.1118e-04, 1.7444e+28],
        [7.3909e+22, 4.5828e+30, 3.2483e+33],
        [1.9690e-19, 6.8589e+22, 1.3340e+31],
        [1.1708e-19, 7.2128e+22, 9.2216e+29]])

It looks like numpy.

1-1. Gradient

By setting $ requires_grad $, it seems that the gradient at a certain point can be obtained.

test.py


x = torch.tensor([1.0, 2.0], requires_grad=True)

As an example, let's find the gradient for the function of the following two variables.

f(x,y)=2x + y^2\\

The formula for the gradient is as follows.

\frac
{\partial f}
{\partial x}
=
2\\
\frac
{\partial f}
{\partial y}
=
2y\\

From the above equation, it can be seen that the gradient in the $ x $ direction is determined by 2 and the gradient in the $ y $ direction is determined by 2y. The gradient at the point (x, y) = (1,2) is (2,4) from the following.

\frac
{\partial f}
{\partial x}
=
2\\
\frac
{\partial f}
{\partial y}
=
2*2=4\\

It is requires_grad that does the calculation. Here's what I actually did with pytorch.

test.py


from __future__ import print_function
import torch

#Preparation of calculation point
z = torch.tensor([1.0, 2.0], requires_grad=True)

# f(z)Preparation of
f = z[0]*2 + z[1]**2

#Execution of differentiation
f.backward()

print(z.grad)
#Execution result
tensor([2., 4.])

1-2. Neutral network

https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html The network will be defined in the next section of the tutorial.

test.py


import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()
print(net)
params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)
net.zero_grad()
out.backward(torch.randn(1, 10))

Even if you run it suddenly, it will be "?", So let's take a look at the contents one by one.

Class inheritance

test.py


class Net(nn.Module):
    def __init__(self):
    #Define each layer
    def forward(self, x):
    #Perform processing of each layer

By inheriting the nn.Module class It seems that you can freely define the network configuration.

Define each layer at the timing when the class is initialized by init, Call the layer defined by forward to call the main processing of CNN.

As an actual flow, Forward is executed when the instance of the class defined by yourself is created.

Forward processing

The main processing is the processing content of forward. Let's break down the sample source and look at each layer one by one. (I will skip the explanation of the activation function)

test.py


    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

↓↓↓↓↓↓↓↓↓↓↓↓↓

        max_pool2d :Pooling layer
        Conv2d     :Convolutional layer
        Linear     :Fully connected layer

pooling layer (pooling layer)

Try running max_pool2d alone.

test.py


import torch
import torch.nn as nn
import numpy as np
import torch.nn.functional as F


class Test_Pooling(nn.Module):

    def __init__(self):
        super(Test_Pooling, self).__init__()
    def forward(self, x):
        print("Before")
        print("size : \t",x.size())
        print("data : \n",x.to('cpu').detach().numpy().copy())
        print("\n")

        x = F.max_pool2d(x, (2, 2))
        print("After")
        print("size : \t",x.size())
        print("data : \n",x.to('cpu').detach().numpy().copy())
        print("\n")
        return x

net = Test_Pooling()

#input
nparr = np.array([1,2,3,4]).astype(np.float32).reshape(2,2)
nparr = np.block([[nparr,nparr],[nparr,nparr]]).reshape(1,1,4,4)
input = torch.from_numpy(nparr).clone()

#output
out = net(input)

#Execution result
Before
size : 	 torch.Size([1, 1, 4, 4])
data : 
 [[[[1. 2. 1. 2.]
   [3. 4. 3. 4.]
   [1. 2. 1. 2.]
   [3. 4. 3. 4.]]]]


After
size : 	 torch.Size([1, 1, 2, 2])
data : 
 [[[[4. 4.]
   [4. 4.]]]]

The input data is as follows.

input = \begin{pmatrix}
1 & 2 & 1 & 2\\
3 & 4 & 3 & 4\\
1 & 2 & 1 & 2\\
3 & 4 & 3 & 4\\
\end{pmatrix}\\

The operation of max_pool2d is within the range (2,2) specified by the argument. Extracts the maximum value and returns the value as a matrix. If applied to the above input matrix, it will be executed for the submatrix 1, 2, 3 and 4, so As a result, (2,2) in which four 4s are lined up is output.

There are two main purposes for pooling.

1.Dimension reduction
2.Ensuring invariance of movement and rotation

One is the reduction of dimensions. As you can see, the 16 numbers have been reduced to 4. If you think about processing hundreds of images, you can expect to shorten the processing many times at once. (Since the reduced amount of information is lost, How much pooling can be done depends on the item. )

The other is invariant. Of course, this can happen if the input image is rotated or misaligned. Pooling seems to be something that can guarantee to some extent even in such cases.

For example, an array of pixels in a grayscale image On the one hand, $ (0,0,0,0,1,2,3,4,5,1,2) $, The other is $ (0,1,2,3,4,5,1,2,3,4,5) $ Considering the case where one image shows the same image but is shifted sideways.

If max pooling is performed in the horizontally long range of (1,12), (5) is output in both images, and "both show the same numerical value", that is, It will be possible to find clues to the feature regardless of the movement of the image.

(... I wrote that I investigated, but regardless of the purpose of 1, How effective can 2 actually be expected? I have the impression that the image is a little unreliable. (Anyway, if the point cloud is out of order))

conv layer (convolutional layer) part 1

Organize the image of convolution and what kind of calculation is actually performed.

What is "convolution"? Don't be afraid to misunderstand, by convolving It is a recognition that it is done to emphasize and extract the characteristic part.

GUID-11A815E1-3652-461E-8C76-56B7DCBF28FD-web.png

For example, a 5x5 filter called a Laplacian filter Convolution integration produces an image with emphasized edges, as in the image above. The characteristics of the edge part are emphasized and extracted. (Reference: https://desktop.arcgis.com/ja/arcmap/10.3/manage-data/raster-and-images/convolution-function.htm)

Regarding what kind of calculation is performed as a calculation formula, It's hard to understand with voice processing, but it's easy to understand visually with images. (↓) no_padding_no_strides.gif

In the case of convolution of the image area, for the pixels of the target image (blue matrix), As shown in the video above, the "kernel" (green matrix) is applied in order to calculate.

img112.png

As for the calculation, in the example of the above image, Apply the kernel around the 1 pixel of (2,2) in [input data].

(1*2)+(2*0)+(3*1)+(0*0)+(1*1)+(2*2)+(3*1)+(0*0)+(1*2)=15

And the ones that are in the same position multiplied by each other are added. In the example shown, if you try to fit a 3x3 filter without any gaps, 4x4 [output data] is calculated.

So while playing with the source based on these, Let's take a look at the calculation.

test.py



class Test_Conv(nn.Module):

    kernel_filter = None
    def __init__(self):
        super(Test_Conv, self).__init__()
        # self.conv = nn.Conv2d(1, 1, 3)
        ksize = 4
        self.conv = nn.Conv2d(
            in_channels=1,
            out_channels=1,
            kernel_size=4,
            bias=False)
        self.kernel_filter = self.conv.weight.data.numpy().reshape(ksize,ksize)

    def forward(self, x):
        print("Before")
        print("size : \t",x.size())
        print("data : \n",x.to('cpu').detach().numpy().copy())
        print("\n")

        print("Calc Self Conv")
        x_np = x.to('cpu').detach().numpy().copy().reshape(4,4)
        calc_conv = 0 ;
        for col in range(self.kernel_filter.shape[0]):
            for row in range(self.kernel_filter.shape[1]):
                calc_conv += self.kernel_filter[row][col] * x_np[row][col]
        print("kernel filter :")
        print(self.kernel_filter )
        print("data : \n",calc_conv)
        print("\n")

        x = self.conv(x)
        print("After")
        print("size : \t",x.size())
        print("data : \n",x.to('cpu').detach().numpy().copy())
        print("\n")
        return x

net = Test_Conv()

#input
nparr = np.array([1,2,3,4]).astype(np.float32).reshape(2,2)
nparr = np.block([[nparr,nparr],[nparr,nparr]]).reshape(1,1,4,4)
input = torch.from_numpy(nparr).clone()

#output
out = net(input)
exit()


With a Laplacian filter that extracts edges, I was applying a fixed kernel filter.

However, the kernel filter automatically generated by Conv2d of pytorch is It seems that the value changes randomly each time it is executed. (It has not been investigated what kind of calculation and what kind of intention conv2d calculates this filter. Isn't it possible to set the filter intentionally? )

By the way, the output result is as follows.

#Execution result
Before
size : 	 torch.Size([1, 1, 4, 4])
data : 
 [[[[1. 2. 1. 2.]
   [3. 4. 3. 4.]
   [1. 2. 1. 2.]
   [3. 4. 3. 4.]]]]


Calc Self Conv
kernel filter :
[[-0.03335193 -0.05553913  0.10690624 -0.0219309 ]
 [-0.02052614  0.23662615 -0.07596081 -0.04400161]
 [ 0.19031712 -0.06902602 -0.24611491 -0.06604707]
 [-0.05149609 -0.08155683  0.06496871 -0.15480098]]
data : 
 -0.8313058316707611


After
size : 	 torch.Size([1, 1, 1, 1])
data : 
 [[[[-0.8313058]]]]


[Dara] of [Before] is Input data. This is the blue procession in the animation above.

And [After]. [dara] is the result calculated in the convolutional layer. It says [-0.8313058].

Next, let's take a look at [Calc Self Conv]. Here, I tried the above calculation by hand. Since kernel_size = 4 was specified when declaring Conv2d, You can see that the kernel filter, which is the basis of the calculation, outputs a 4x4 matrix. This is the green matrix in the previous animation.

Now, let's compare the result of manual calculation with the result of conv2d. Looking at [data] in [Calc Self Conv] and [After], You can see that the numbers are almost the same.

With such an image, the convolutional layer is calculated. Note that this example is fairly simple to simplify the calculation, so I'll dig a little deeper next time.

conv layer (convolutional layer) part 2

I will study a little more. I somehow understood what kind of calculation the convolution was doing. Next, let's take a closer look at the parameters involved when performing the actual calculation.

The following parameters are mainly present in conv2d.

test.py


in_channels
out_channels
kernel_size
stride
padding
bias

To give an intuitive explanation Let's look at the actual code one by one.

in_channels In in_channels, set the number of dimensions per data.

data : 
 [[[[0. 1. 0. 1.]
   [0. 0. 0. 0.]
   [0. 1. 0. 1.]
   [0. 0. 0. 0.]]]]

Let's take the above data as an example. In terms of images, this data is a grayscale image, The number of channels (number of colors) is only one data. For such data, set "1". "3" for RGB 3D data, If you have a position and RGB like a point cloud, enter "6".

out_channels The number of dimensions of the output data is output to out_channels. The number of kernel filters specified here will be generated. The specified number of filter application results will be returned.

In other words, the more dimensions here, the more Features are extracted by filters created for the number of dimensions. Features with various characteristics can be extracted as a quantity.

For example Filters that extract animal characteristics, Filters that extract human characteristics, A filter that extracts the characteristics of the cup, And here you declare the number of filters that extract various features.

(If you listen only to this explanation, the more you have, the better it seems. Intuitively, if you extract the features too finely, Even if you can make detailed classifications like Mr. A and Mr. B, I feel that problems such as being unable to classify by human beings are likely to occur. (Image without personal practice))

stride stride refers to the width of movement when writing a filter. no_padding_no_strides.gif In this image, the stride is "1" because it is moving one by one.

Let's see this movement a little more in the code.

test.py



class Test_Conv2(nn.Module):

    kernel_filter = None
    def __init__(self):
        super(Test_Conv2, self).__init__()
        self.conv = nn.Conv2d(
            in_channels=1,
            out_channels=2,
            kernel_size=2,
            stride=2,
            padding=0,
            bias=False)
        self.kernel_filter = self.conv.weight.data.numpy()

    def forward(self, x):
        print("Before")
        print("size : \t",x.size())
        print("data : \n",x.to('cpu').detach().numpy().copy())
        print("\n")

        print("Calc Self Conv")
        print("kernel filter :")
        print(self.kernel_filter )
        print("\n")

        x = self.conv(x)
        print("After")
        print("size : \t",x.size())
        print("data : \n",x.to('cpu').detach().numpy().copy())
        print("\n")
        return x

net = Test_Conv2()

#input
nparr = np.array([0,1,0,0]).astype(np.float32).reshape(2,2)
nparr = np.block([[nparr,nparr],[nparr,nparr]]).reshape(1,1,4,4)
input = torch.from_numpy(nparr).clone()

#output
out = net(input)
exit()
#Execution result
Before
size : 	 torch.Size([1, 1, 4, 4])
data : 
 [[[[0. 1. 0. 1.]
   [0. 0. 0. 0.]
   [0. 1. 0. 1.]
   [0. 0. 0. 0.]]]]


Calc Self Conv
kernel filter :
[[[[-0.07809174 -0.39049476]
   [-0.00448102 -0.09000683]]]


 [[[ 0.03750324  0.12070286]
   [-0.06378353  0.22772777]]]]


After
size : 	 torch.Size([1, 2, 2, 2])
data : 
 [[[[-0.39049476 -0.39049476]
   [-0.39049476 -0.39049476]]

  [[ 0.12070286  0.12070286]
   [ 0.12070286  0.12070286]]]]

I set the stride to "2". Since it shifts by two, as a calculation, With [data] in [Before] With [kernel filter] of [Calc Self Conv] It will be multiplied.

input = \begin{pmatrix}
0 & 1 \\
0 & 0
\end{pmatrix}\\\\

filter = \begin{pmatrix}
-0.07809174 & -0.39049476 \\
-0.00448102 & -0.09000683
\end{pmatrix}\\

Is it actually off by two? Check the execution result.

The input data is a matrix in which the above [input] submatrix is lined up. If you try to apply the [filter] of (2,2) to such input data, It will be multiplied 4 times so that it fits the submatrix of [input] exactly.

output = \begin{pmatrix}
-0.39049476 & -0.39049476 \\
-0.39049476 & -0.39049476
\end{pmatrix}\\

Therefore, Since the convolution of the [input] submatrix and the [filter] matrix is performed 4 times, The value of the filter where the submatrix of [input] is the only one [1] is output four times. Therefore, the convolution is calculated as a result like [output].

From the above, I think you can see how the pixels are shifted by two and the convolution is performed.

padding I didn't say anything in particular, but when striding the filter, I explained on the assumption that the filter is moved within the range that fits in the image.

no_padding_no_strides.gif In this image, near the center of the pixel in the blue image, You're filtering around (1,1) (1,2) (2,1) (2,2).

When I try to apply a filter to the upper left position of (0,0), It can be expected that the upper part will not overlap and it will not be possible to calculate.

The central part of the image is equally filtered, The edge of the image is unfiltered, In other words, this is a state where the features of the edge part have not been extracted.

So, by adding virtual columns / rows of [0] to both ends and calculating, The idea of "padding" is to calculate the edge part as well.

1_1VJDP6qDY9-ExTuQVEOlVg.gif

bias It just adds the values to all the elements of the output.

Is it used to determine the superiority or inferiority of importance for each filter?

Or conversely, something that can only pick up small features Is there something like making it bigger to treat it like any other feature?

Output result

The dimension of the output result of the convolution changes depending on the parameter. Let's see how the dimensions change based on the parameters of conv2d.

test.py


in_channels
out_channels
kernel_size
stride
padding
bias

Example 1 Input data = (dimension = 1, (4 × 4))) / kernel size = 2 / padding = 0


in_channels=1\\
out_channels=1\\
kernel_size=2\\
stride=2\\
padding=0\\
bias=0\\
\\
\\
input = \begin{pmatrix}
0 & 1 & 0 & 1\\
0 & 0 & 0 & 0\\
0 & 1 & 0 & 1\\
0 & 0 & 0 & 0
\end{pmatrix}\\\\

filter = \begin{pmatrix}
-0.07809174 & -0.39049476 \\
-0.00448102 & -0.09000683
\end{pmatrix}\\

In this case, the filter of (2,2) takes the input data of (4,4). Move two by two and move along exactly. The output result is 2x2.

Example 2 Input data = (dimension = 1, (4 × 4))) / kernel size = 3 / padding = 0

in_channels=1\\
out_channels=1\\
kernel_size=3\\
stride=2\\
padding=0\\
bias=0\\
\\
\\
input = \begin{pmatrix}
0 & 1 & 0 & 1\\
0 & 0 & 0 & 0\\
0 & 1 & 0 & 1\\
0 & 0 & 0 & 0
\end{pmatrix}\\\\

filter = \begin{pmatrix}
-0.41127872
\end{pmatrix}\\

In this case, the filter (3,3) moves the input data of (4,4) by two. Since it is calculated from the upper left, the input data is centered on the position (1,1). The filter is folded.

Since the stride is 2 from there, if you shift 2 to the side, The center position of the next calculation is (1,3). Because it's on the far right All the filters don't fit.

Since it shifts by 2 in the vertical direction, As a result, even if the stride operation is repeated, the convolution is performed only once. Therefore, there is only one result.

Example 2 Input data = (dimension = 1, (4 × 4))) / kernel size = 3 / padding = 1

in_channels=1\\
out_channels=1\\
kernel_size=3\\
stride=2\\
padding=1\\
bias=0\\
\\
\\
input = \begin{pmatrix}
0 & 0 & 0 & 0 & 0 & 0\\
0 & 0 & 1 & 0 & 1 & 0\\
0 & 0 & 0 & 0 & 0 & 0\\
0 & 0 & 1 & 0 & 1 & 0\\
0 & 0 & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 0 & 0 & 0\\
\end{pmatrix}\\\\

filter = \begin{pmatrix}
0.08725476 & 0.4106578  \\
0.08725476 & 0.4106578 \\
\end{pmatrix}\\

0s have been padded one row / column at each end. Based on the upper left, if you calculate at the position where the 3x3 filter fits, It is calculated 4 times around (1,1) (1,3) (3,1) (3,3). Therefore, the output result is 2x2.

Linear layer (fully connected layer)

Fully connected layer.

It ’s very simple to do, Various features extracted by convolutional layers, etc. Let's put it together? I am doing that.

If you don't be afraid of misunderstanding "Red square". Consider the case where you try to distinguish between "blue square" and "other than that".

Even if you extract only the characteristics of the square, you cannot distinguish between red and blue, so You also have to match the color characteristics.

Therefore, it is necessary to combine the characteristics of the square, the characteristics of the color, and multiple characteristics. I recognize it as a fully connected layer.

y = xA^T \\

The calculation I'm doing is simple, Apply the linear transformation $ A $ to the input $ x $ and Just calculate $ y $ as output.

y = xA^T \\
⇔\\

\begin{pmatrix}
y_1 \\
y_2 \\
\end{pmatrix}
=
\begin{pmatrix}
A_{00} & A_{01} \\
A_{10} & A_{11} \\
\end{pmatrix}

\begin{pmatrix}
x_1 \\
x_2 \\
\end{pmatrix}\\
⇔\\


\begin{pmatrix}
y_1 =  A_{00} x_1 + A_{01}x_2\\
y_2 =  A_{10} x_1 + A_{11}x_2 \\
\end{pmatrix}\\

I will try a little more intuitive explanation. The above formula is an expansion of the previous formula for small two-dimensional data. The input data $ x $ is a numerical value extracted from the image. The expression has values of $ x_1 $ and $ x_2 $, Learn this from the previous example, "Results of extracting square features from images" "Results of extracting color features from images" will do.

As mentioned above, each one alone I don't know if it's a red square or a blue square. You also have to see the result.

Based on that, looking at the final formula, You can see that $ x_1 $ and $ x_2 $ are integrated.

For the time being, I'm seeing the operation in the code.

test.py



class Test_Linear(nn.Module):

    fc_filter = None
    def __init__(self):
        super(Test_Linear, self).__init__()
        self.conv = nn.Conv2d(
            in_channels=1,
            out_channels=2,
            kernel_size=4,
            stride=2,
            padding=0)
        self.kernel_filter = self.conv.weight.data.numpy()
        self.fc = nn.Linear(in_features=1,
                            out_features=1,
                            bias=False)
        self.fc_filter = self.fc.weight.data.numpy()
        print(self.fc_filter)

    def forward(self, x):
        nparr = np.array( [[[[1.0]],[[100.0]]]]).astype(np.float32)
        input = torch.from_numpy(nparr).clone()
        x = input

        print("Before")
        print("size : \t",x.size())
        print("data : \n",x.to('cpu').detach().numpy().copy())
        print("\n")

        x = self.fc(x)

        print("After Linear")
        print("size : \t",x.size())
        print("data : \n",x.to('cpu').detach().numpy().copy())

        print("\n")
        return x
net = Test_Linear()

#input
nparr = np.array([0,1,0,0]).astype(np.float32).reshape(2,2)
nparr = np.block([[nparr,nparr],[nparr,nparr]]).reshape(1,1,4,4)
input = torch.from_numpy(nparr).clone()

#output
out = net(input)
exit()

#Execution result
[[0.04909718]]
Before
size : 	 torch.Size([1, 2, 1, 1])
data : 
 [[[[  1.]]

  [[100.]]]]


After Linear
size : 	 torch.Size([1, 2, 1, 1])
data : 
 [[[[0.04909718]]

  [[4.909718  ]]]]


It's insanely simple. A=[0.04909718] x=[1,100]^T Is calculated according to $ y = xA ^ T $, y=[0.04909718,4.909718]^T Is just output.

By the way, as out_features = 2, If you increase the output dimension,

#Execution result
[[-0.5130856]
 [ 0.6920992]]
Before
size : 	 torch.Size([1, 2, 1, 1])
data : 
 [[[[  1.]]

  [[100.]]]]


After Linear
size : 	 torch.Size([1, 2, 1, 2])
data : 
 [[[[ -0.5130856   0.6920992]]

  [[-51.30856    69.20992  ]]]]


One more dimension of A, A=[-0.5130856,0.6920992] Next, The output $ y $ is also output in the form of being added to the previous result.

Summary of contents so far

https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html The network configuration in the pytorch sample source, If you analyze the flow roughly,

    1. Extract features in the convolutional layer,
  1. Organize the features extracted by the pooling layer and activation function (not mentioned on this page),
    1. Eventually integrated at the bond layer You can see that it is the flow.

With the contents so far, we have been able to acquire the features numerically. However, this alone cannot perform identification processing.

If you use the contents so far, You can get the characteristics of Mr. A and Mr. B, If someone who doesn't know A or B arrives It is necessary to judge whether the characteristic of the unidentified person is A or B.

1-2. Loss function

Features can already be extracted using the networks described so far. Let's identify with a simple example.

test.py



class Test_Conv(nn.Module):

    kernel_filter = None
    def __init__(self):
        super(Test_Conv, self).__init__()
        # self.conv = nn.Conv2d(1, 1, 3)
        ksize = 4
        self.conv = nn.Conv2d(
            in_channels=1,
            out_channels=1,
            kernel_size=4,
            bias=False)
        self.kernel_filter = self.conv.weight.data.numpy().reshape(ksize,ksize)

    def forward(self, x):
        x = self.conv(x)
        return x


net = Test_Conv()

#input
nparr = np.array([0,1,0,0]).astype(np.float32).reshape(2,2)
nparr = np.block([[nparr,nparr],[nparr,nparr]]).reshape(1,1,4,4)
input = torch.from_numpy(nparr).clone()
print("*****Learning phase*****")
print("Input data for network learning")
print(input)
print("\n")
#output
out = net(input)

#Target input#Enter the same data
out_target1 = net(input)
criterion = nn.MSELoss()
loss = criterion(out, out_target1)
print("*****Evaluation phase*****")
print("Enter the same data")
print("input:")
print(input)
print("Evaluation",loss)
print("\n")

#Target input#Enter slightly different data
nparr2 = np.array([0,2,0,0]).astype(np.float32).reshape(2,2)
nparr2 = np.block([[nparr2,nparr2],[nparr2,nparr2]]).reshape(1,1,4,4)
input2 = torch.from_numpy(nparr2).clone()
out_target2 = net(input2)

criterion = nn.MSELoss()
loss = criterion(out, out_target2)
print("Enter slightly different data")
print("input:")
print(input2)
print("Evaluation",loss)
print("\n")

#Target input#Enter completely different data
nparr3 = np.array([10,122,1000,200]).astype(np.float32).reshape(2,2)
nparr3 = np.block([[nparr3,nparr3],[nparr3,nparr3]]).reshape(1,1,4,4)
input3 = torch.from_numpy(nparr3).clone()
out_target3 = net(input3)

criterion = nn.MSELoss()
loss = criterion(out, out_target3)
print("Enter completely different data")
print("input:")
print(input3)
print("Evaluation",loss)
print("\n")



#Execution result
*****Learning phase*****
Input data for network learning
tensor([[[[0., 1., 0., 1.],
          [0., 0., 0., 0.],
          [0., 1., 0., 1.],
          [0., 0., 0., 0.]]]])


*****Evaluation phase*****
Enter the same data
input:
tensor([[[[0., 1., 0., 1.],
          [0., 0., 0., 0.],
          [0., 1., 0., 1.],
          [0., 0., 0., 0.]]]])
Rating tensor(0., grad_fn=<MseLossBackward>)


Enter slightly different data
input:
tensor([[[[0., 2., 0., 2.],
          [0., 0., 0., 0.],
          [0., 2., 0., 2.],
          [0., 0., 0., 0.]]]])
Rating tensor(0.4581, grad_fn=<MseLossBackward>)


Enter completely different data
input:
tensor([[[[  10.,  122.,   10.,  122.],
          [1000.,  200., 1000.,  200.],
          [  10.,  122.,   10.,  122.],
          [1000.,  200., 1000.,  200.]]]])
Rating tensor(58437.6680, grad_fn=<MseLossBackward>)

I don't have to explain anything in particular,

out = net(input)
out_target3 = net(input3)

Input each input data into the created network to calculate the characteristics, (Out, out_target3 contains the matrix calculated by conv2d.)

#Define evaluation method
criterion = nn.MSELoss()
#Evaluation execution
loss = criterion(out, out_target3)

The evaluation method is defined (here, the mean square error) and evaluated. If the features are close, the value will be close to 0, The farther the feature is, the larger the value.

Recommended Posts

[Super Introduction to Machine Learning] Learn Pytorch tutorials
Super introduction to machine learning
Introduction to machine learning
An introduction to machine learning
Introduction to machine learning Note writing
Introduction to Machine Learning Library SHOGUN
Machine learning to learn with Nogizaka46 and Keyakizaka46 Part 1 Introduction
pytorch super introduction
Introduction to Machine Learning: How Models Work
An introduction to OpenCV for machine learning
Introduction to ClearML-Easy to manage machine learning experiments-
An introduction to Python for machine learning
[Python] Easy introduction to machine learning with python (SVM)
An introduction to machine learning for bot developers
Introduction to Lightning pytorch
Somehow learn machine learning
PyTorch Super Introduction PyTorch Basics
[For beginners] Introduction to vectorization in machine learning
[Super Introduction] Machine learning using Python-From environment construction to implementation of simple perceptron-
Site summary to learn machine learning with English video
An introduction to machine learning from a simple perceptron
[Learning memorandum] Introduction to vim
Introduction to PyTorch (1) Automatic differentiation
A super introduction to Linux
Machine learning Minesweeper with PyTorch
Introduction to Deep Learning ~ Learning Rules ~
Deep Reinforcement Learning 1 Introduction to Reinforcement Learning
Introduction to Deep Learning ~ Backpropagation ~
MS Learn Recommended Learning Pass February Issue [Introduction to AI]
Introduction to Machine Learning with scikit-learn-From data acquisition to parameter optimization
Machine Learning Super Introduction Probability Model and Maximum Likelihood Estimate
Introduction to Deep Learning ~ Function Approximation ~
[Details (?)] Introduction to pytorch ~ CNN CIFAR10 ~
Introduction to Deep Learning ~ Coding Preparation ~
Introduction to Deep Learning ~ Dropout Edition ~
Introduction to Deep Learning ~ Forward Propagation ~
Introduction to Deep Learning ~ CNN Experiment ~
How to collect machine learning data
Python learning memo for machine learning by Chainer Chapter 8 Introduction to Numpy
Introduction to Python Basics of Machine Learning (Unsupervised Learning / Principal Component Analysis)
Before the introduction to machine learning. ~ Technology required for machine learning other than machine learning ~
Python learning memo for machine learning by Chainer Chapter 10 Introduction to Cupy
[Introduction to StyleGAN] Unique learning of anime with your own machine ♬
Python learning memo for machine learning by Chainer Chapter 9 Introduction to scikit-learn
Reinforcement learning to learn from zero to deep
Record the steps to understand machine learning
[Introduction to machine learning] Until you run the sample code with chainer
A super introduction to Python bit operations
I installed Python 3.5.1 to study machine learning
Introduction to Deep Learning ~ Convolution and Pooling ~
[PyTorch] Introduction to document classification using BERT
[Python] Introduction to CNN with Pytorch MNIST
Machine learning
[Introduction to Pytorch] I played with sinGAN ♬
Python beginners publish web applications using machine learning [Part 2] Introduction to explosive Python !!
Machine learning with Pytorch on Google Colab
How to enjoy Coursera / Machine Learning (Week 10)
Introduction to Machine Learning-Hard Margin SVM Edition-
"OpenCV-Python Tutorials" and "Practical Machine Learning System"
Introduction to TensorFlow-Machine Learning Terminology / Concept Explanation
[Machine learning] Regression analysis using scikit learn