Machine learning and deep learning have a rich library, so you can make predictions with a simple copy and paste. I myself have run programs created by many ancestors and have come to a level where I can understand the outline. In particular, deep learning (neural network) is applied to GAN and natural language processing. It is a genre in which new technologies are being created at a dizzying pace, and I believe that its application to society and industry is progressing rapidly. Therefore, we recognize that it is the technology at the center of such transformation and would like to have a very interesting and deep understanding of these areas! There is a motive.
Currently, we are starting to learn from the basics here, which is famous as a deep learning textbook. https://www.oreilly.co.jp/books/9784873117584/ This time, by building a neural network from almost scratch (although I use numpy), I would like to get a real sense of the calculations being performed there.
The summary is below.
A perceptron means that it receives multiple signals as inputs and outputs one signal. In the field of machine learning, not outputting a signal is treated as 0, and outputting a signal is treated as 1.
The figure above is a simple diagram of this idea. x is the input signal, y is the output signal, and w is the weight. This 〇 is called a neuron. The neuron is sent the sum of the weight multiplied by the input value. At this time, output 1 is output when the sum exceeds the threshold value θ. The formula is as follows.
Now, let's actually make a program. I reproduced the simple pattern shown in the figure above. First, let's calculate when the threshold is 0.4.
NN.ipynb
def AND(x1,x2):
w1,w2,theta = 0.5,0.5,0.4
tmp = x1*w1 + x2*w2
b = -0.5
if tmp <= theta:
return 0
elif tmp > theta:
return 1
python
print(AND(0,0))
print(AND(1,0))
print(AND(0,1))
print(AND(1,1))
0
1
1
1
As a result, it was found that if either x1 or x2 is 1, 1 is output as output. On the other hand, if the threshold is 0.7, it will be as follows.
print(AND(0,0))
print(AND(1,0))
print(AND(0,1))
print(AND(1,1))
0
0
0
1
If only one of x1 and x2 is 1, the output will no longer spit out 1. You can see that the output obtained changes depending on the threshold setting.
A multi-layer perceptron is a network that has a layer called an intermediate layer between the input layer and the output layer. The description may differ depending on the book, but in the case of the figure below, the input layer is called 0 layer, the intermediate layer is called 1 layer, and the output layer is called 2 layers.
Here, there seems to be a style (custom?) In how to count when calling it a 〇-layer neural network. It seems that we may count the weight layer or call the layer of neurons. I don't have much experience with which is more common, but I would like to learn from O'Reilly's textbooks and name it on a weight-based basis.
Next, we introduce a value called bias b. By rearranging the above equation with the threshold value θ as -b, it is possible to determine the output of y to 0 or 1 based on 0 as in the above equation. Bias is the meaning of a correction value that "geta geta" in the small job industry (manufacturing industry), and it is possible to raise or lower the value on the y-axis as a whole.
The function that determines whether y is 0 or 1 is called the activation function. Since the value obtained by this activation function can be a value near 0 or 1, there is also a function that can prevent the calculation from diverging. There are several types of this activation function.
It is one of the functions often used in the activation function. It is a fraction of the function of the Napier number e, which is the base of the natural logarithm as shown below. The shape of this function is hard to come up with, but it looks like the following.
NN.ipynb
import numpy as np
def sigmoid(x):
return 1/(1+np.exp(-x))
xxx = np.arange(-5.0,5.0,0.1)#Show sigmoid function
yyy = sigmoid(xxx)
plt.plot(xxx,yyy)
plt.ylim(-0.1,1.1)
plt.show
It can be seen that when x> 0 with x = 0 as the boundary, it gradually approaches y = 1. On the contrary, when x <0, it is asymptotic to y = 0. It turns out that the point that the input value can be output between 0 and 1 is very convenient because it plays the role of the activation function.
Next, there is a step function as a function that outputs 0,1 to the extreme of the sigmoid function. This would be written as follows.
NN.ipynb
def step_function(x):
return np.array(x > 0, dtype=np.int)
x = np.arange(-5.0, 5.0, 0.1)
y = step_function(x)
plt.plot(xxx,yyy)
plt.plot(x, y)
plt.ylim(-0.1, 1.1)
plt.show()
Blue is the step function and orange is the sigmoid function. You can see that the output value is only 0 or 1. I have little sense of how to use this function properly, so let me leave it as my homework in the future. ** As a feeling, I understand that the sigmoid function has a function that can distinguish the difference even with a slight input difference because it can take the value more finely. On the other hand, if there are multiple layers and the calculation load is high, I think that it is possible to make a distinction while reducing the load by using the step function as appropriate. ** **
Finally, I would like to talk about the ReLU (Rectified Linear Unit) function, which also has the impression that it is often used. If x exceeds 0, the value is output as y as it is, and if it is 0 or less, 0 is output.
NN.ipynb
def relu(x):
return np.maximum(0,x)
xx = np.arange(-5.0,5.0,0.1)
yy = relu(xx)
plt.plot(xx,yy)
plt.ylim(-0.1,5)
plt.show
This time, we will create a forward propagation type neural network. This forward propagation type indicates that the flow flows from the input to the output in one direction. When thinking about training a model, the calculation is performed from the output to the input. This is called the backpropagation method.
Now, I would like to actually describe a three-layer neural network.
Consider creating a three-layer neural network as shown in the figure above. First, let's take out only the calculations that stand out in bold in the above figure.
NN.ipynb
def init_network():
network = {}
network['W1'] = np.array([[0.1,0.3,0.5],[0.2,0.4,0.6]])
network['b1'] = np.array([0.1,0.2,0.3])
return network
def forword(network,x):
W1= network['W1']
b1= network['b1']
a1 = np.dot(x,W1)+b1
z1 = sigmoid(a1)
return y
network = init_network()
x = np.array([2,1])
z1 = forword(network,x)
print(z1)
[0.40442364 0.59557636]
I let the init_network () function define weights and biases, and let the forword () function define the formulas that actually calculate. After that, the function is called and the initial value x is assigned so that the answer can be spit out. It's easier to understand than writing a function without defining it in a row.
** Also, pay attention to the function that represents the inner product of the matrix described here as np.dot. Be careful when describing the product of matrices, because the dimensions of the matrix obtained in the order of multiplication change. ** **
NN.ipynb
def init_network():
network = {}
network['W1'] = np.array([[0.1,0.3,0.5],[0.2,0.4,0.6]])
network['b1'] = np.array([0.1,0.2,0.3])
network['W2'] = np.array([[0.1,0.4],[0.2,0.5],[0.3,0.6]])
network['b2'] = np.array([0.1,0.2])
network['W3'] = np.array([[0.1,0.3],[0.2,0.4]])
network['b3'] = np.array([0.1,0.2])
return network
def forword(network,x):
W1,W2,W3 = network['W1'],network['W2'],network['W3']
b1,b2,b3 = network['b1'],network['b2'],network['b3']
a1 = np.dot(x,W1)+b1
z1 = sigmoid(a1)
a2 = np.dot(z1,W2)+b2
z2 = sigmoid(a2)
a3 = np.dot(z2,W3)+b3
y = softmax(a3)
return y
If you let the two functions describe to the end, it will be written like this. Well, I mentioned earlier here, but there is a description written as softmax at the end. This is summarized below.
After that, we can see that we should add layers to these two functions. Then, consider the value y to be output at the end. When it is necessary to classify problems such as guessing 0 to 9 types of numbers, the probability corresponding to each type is output, and the one with the highest probability is used as the predicted value. A convenient function to express such a probability is the softmax function.
The sum of the values taken for all items in a certain classification can be used as the denominator, and the individual values taken as the numerator can be used as the value representing the probability. By ending with this softmax function, the classification problem is reduced to the probability, and the highest value is the predicted value.
In terms of implementation, since it is an exponential function of exp, there is a problem that the value is very easy to diverge. Therefore, it seems that it is often done for convenience that it becomes difficult to diverge by multiplying a certain constant by the denominator and numerator and inserting it into the exponent of exp.
NN.ipynb
def softmax(a):
c = np.max(a)
exp_a = np.exp(a-c)
sum_exp_a = np.sum(exp_a)
y = exp_a/sum_exp_a
return y
NN.ipynb
network = init_network()
x = np.array([2,1])
y = forword(network,x)
print(y)
[0.40442364 0.59557636]
As a test, I put an appropriate value in x, and the answer was returned as follows. A value indicating that y1 has a probability of 40% and y2 has a probability of 60% was output. After that, I understand that complicated classifications will become possible as the input matrix becomes larger and the layers become deeper (≈ increase).
This time, I made a very basic neural network by hand. I deepened my understanding by just moving my hands. ** I finally came to understand the basics of the GAN algorithm that I copied and moved. ** By putting ideas such as model learning and convolution here, it will lead to a convolutional neural network and further to GAN. It may be an introduction to reach the latest technology, but I hope that by steadily deepening my understanding in this way, I will surely improve my technical capabilities.
The full program is here. It is divided into a file that you just played with the function and a file that is a 3-layer neural network. https://github.com/Fumio-eisan/neuralnetwork_20200318
Recommended Posts