This is a continuation of Python beginners touch Pytorch (2). I personally write an article three times. Finally, I will explain the neural network that inspired me to touch Pytorch.
If you show it in a diagram, you can see why it is called a network. By the way, the node is in the shape of a circle, and the link is the one that connects the nodes represented by the arrows.
"Deep learning," which has become a hot topic in recent years, is a stack of "intermediate layers (two layers in between)" </ strong> in this figure. This model is also referred to as hierarchical </ strong> </ font>. There is also a model called recursive (RNN) </ strong> </ font>. Please see the image below. This is good at learning that keeps time series. However, the amount of calculation is large, which makes the calculation difficult.
The following is a comparison of both networks. You should select the network according to the application you want to solve. By the way, hierarchical type </ strong> is often used for image recognition, and recursive type </ strong> is often used for natural language processing (character recognition, voice recognition).
Weights represent the importance of an input. If the weight is high, the part with the high weight is of high importance in discriminating the event that is a neural network.
Let's dig a little deeper into the weights using a concrete example. For example, when buying a bag, I think there is something important for each person to decide (determine) whether to buy or not. Roughly speaking, "durability", "capacity", "design", "name recognition", etc. Since I attach great importance to design, if each weight is expressed numerically, "element" = weight: "durability" = 5, "capacity = 5", "design = 8", " The name recognition = 5 "</ font>. Since design is of the utmost importance, it is natural that the weight of design is high.
Let's show it in a diagram.
The diagram is easy to understand and simple. In this figure, you can see that "Input 1" is an important element in this layer. Also, as the next input layer increases, the number of weights will increase accordingly. Looking at the figure, you can see that different numbers are passed to the two next layers.
By the way, this kind of sequential propagation of input is called forward propagation </ font>.
In the neural network, each weight is corrected by "training" to find the appropriate weight. So this time it's okay if you understand what the weights are. Next, I will explain the incomprehensible function called "activation function".
As a famous activation function
there is. Please check the function by yourself. It may seem difficult to use only mathematical formulas, but what is important in the activation function is not the difficulty of mathematical formulas. 1. Non-linear, 2. Easy differentiation </ font>. Non-linear means that it is not straight. Let's take a look at the graph of the ReLU function. It is listed on wikipedia. [Activation function (wikipedia)](https://ja.wikipedia.org/wiki/%E6%B4%BB%E6%80%A7%E5%8C%96%E9%96%A2%E6%95% B0 # ReLU% EF% BC% 88% E3% 83% A9% E3% 83% B3% E3% 83% 97% E9% 96% A2% E6% 95% B0% EF% BC% 89)
What do you think. Certainly it wasn't linear. Next is the ease of differentiation. For the basics of differentiation, see Previous article and sites and books that explain more professionally. Easy differentiation makes network training easier and weights easier to find. (If you build a neural network using the framework, the program will do it automatically, so don't worry.)
Next, we will learn when to use the activation function. The timing to use the activation function is immediately before </ font>, which propagates forward and is transmitted to the next layer.
What does it mean to make neural networks more flexible with activation functions? Let's check this in the figure as well. First of all, when constructing a neural network linearly and discriminating without inserting an activation function
Not all objects are neatly divided even if they are discriminated. Just as we humans mistake things that are easy to understand, things that have similar artificial intelligence can be mistakenly identified. Let's add an activation function and convert it non-linearly. The graph works a little too well, but you can change the width of the discrimination like this. There is no guarantee that the judgment will be correct, but there is no doubt that the correct answer rate of the judgment will change at least rather than staying linear.
import torch
import torch.nn as nn
import torch.nn.functional as F
First, import the required modules. Next, we will build a network. By the way, the network to be built this time is like this.
class Net(nn.Module):
def __init__(self):
super(Net,self).__init__()
self.fc1 = nn.Linear(2,4)
self.fc2 = nn.Linear(4,4)
self.fc3 = nn.Linear(4,1)
def forword(self,x):
y = F.relu(self.fc1(x))
y = F.relu(self.fc2(y))
y = self.fc3(y)
return y
In Pytorch, we will build a neural network so that we define the network and then call it with a function. It is a so-called dynamic graph (define by run). Compared to TensorFlow (a framework developed by Google), I feel that it is easier to understand because it retains the Python character.
I will explain the code. Class Net is created using the module nn.Module </ strong> that Pytorch has. We will use this nn.Modelu to define the graph. First, create def \ __ init \ __ (self) by initialization and call \ __ init \ __ of nn.Module. After that, create a layer with self. (Name of layer). This time as explained in the image
1st layer (input = 2, output = 4) 2nd layer (input = 4, output = 4) 3rd layer (input = 4, output = 1)
Network configuration.
In the program self. (Layer name) = nn.Linear (number of inputs, number of outputs) </ strong>
Let's write. nn.Linear </ strong> is a module called fully connected </ font> that is used when creating a graph in which the nodes of the input layer propagate to all the nodes of the next layer. ..
The forward function describes the behavior of the neural network when there is an actual input. In the first line, enter the argument "x" in the first layer and perform the activation function ReLU. The second line inputs the output "y" of the first layer to the second layer and applies the ReLU function. Finally, input to the final layer and return the output result.
Take a look at the network overview
net = Net()
print(net)
Net(
(fc1): Linear(in_features=2, out_features=4, bias=True)
(fc2): Linear(in_features=4, out_features=4, bias=True)
(fc3): Linear(in_features=4, out_features=1, bias=True)
)
You can confirm that the network has been built firmly.
By the way, you can also see the initial value of the network weight.
for param_tensor in net.state_dict():
print(param_tensor, "\t", net.state_dict()[param_tensor].size())
fc1.weight torch.Size([4, 2])
fc1.bias torch.Size([4])
fc2.weight torch.Size([4, 4])
fc2.bias torch.Size([4])
fc3.weight torch.Size([1, 4])
fc3.bias torch.Size([1])
There is a "bias" here, which is called a bias and is added to the calculation of each layer.
2x+3
The "3" in the above linear function is the bias. In mathematical terms, it is a intercept </ strong>.
Next time, I will use the neural network construction method I learned this time to build a more practical neural network. Specifically, we will solve the "OR circuit" and "AND circuit" of the logic circuit with a neural network. Thank you for reading until the end.
Recommended Posts