Introduction

Machine learning and deep learning have a rich library, so you can make predictions with a simple copy and paste. I myself have run programs created by many ancestors and have come to a level where I can understand the outline. In particular, deep learning (neural network) is applied to GAN and natural language processing. It is a genre in which new technologies are being created at a dizzying pace, and I believe that its application to society and industry is progressing rapidly. Therefore, we recognize that it is the technology at the center of such transformation and would like to have a very interesting and deep understanding of these areas! There is a motive.

Currently, we are starting to learn from the basics here, which is famous as a deep learning textbook. https://www.oreilly.co.jp/books/9784873117584/ This time, by building a neural network from almost scratch (although I use numpy), I would like to get a real sense of the calculations being performed there.

The summary is below.

Understand the perceptron
Expand to neural network
Implement a 3-layer neural network

Understand the perceptron

A perceptron means that it receives multiple signals as inputs and outputs one signal. In the field of machine learning, not outputting a signal is treated as 0, and outputting a signal is treated as 1.

The figure above is a simple diagram of this idea. x is the input signal, y is the output signal, and w is the weight. This 〇 is called a neuron. The neuron is sent the sum of the weight multiplied by the input value. At this time, output 1 is output when the sum exceeds the threshold value θ. The formula is as follows.

Check with the AND circuit

Now, let's actually make a program. I reproduced the simple pattern shown in the figure above. First, let's calculate when the threshold is 0.4.

`NN.ipynb`



def AND(x1,x2):
    w1,w2,theta = 0.5,0.5,0.4
    tmp = x1*w1 + x2*w2
    b = -0.5
    if tmp <= theta:
        return 0
    elif tmp > theta:
        return 1

`python`


print(AND(0,0))
print(AND(1,0))
print(AND(0,1))
print(AND(1,1))

0
1
1
1

As a result, it was found that if either x1 or x2 is 1, 1 is output as output. On the other hand, if the threshold is 0.7, it will be as follows.


print(AND(0,0))
print(AND(1,0))
print(AND(0,1))
print(AND(1,1))
0
0
0
1

If only one of x1 and x2 is 1, the output will no longer spit out 1. You can see that the output obtained changes depending on the threshold setting.

Expand to neural network

A multi-layer perceptron is a network that has a layer called an intermediate layer between the input layer and the output layer. The description may differ depending on the book, but in the case of the figure below, the input layer is called 0 layer, the intermediate layer is called 1 layer, and the output layer is called 2 layers.

How to count the layers of a neural network

Here, there seems to be a style (custom?) In how to count when calling it a 〇-layer neural network. It seems that we may count the weight layer or call the layer of neurons. I don't have much experience with which is more common, but I would like to learn from O'Reilly's textbooks and name it on a weight-based basis.

Understand bias

Next, we introduce a value called bias b. By rearranging the above equation with the threshold value θ as -b, it is possible to determine the output of y to 0 or 1 based on 0 as in the above equation. Bias is the meaning of a correction value that "geta geta" in the small job industry (manufacturing industry), and it is possible to raise or lower the value on the y-axis as a whole.

Understand the activation function

The function that determines whether y is 0 or 1 is called the activation function. Since the value obtained by this activation function can be a value near 0 or 1, there is also a function that can prevent the calculation from diverging. There are several types of this activation function.

Sigmoid function

It is one of the functions often used in the activation function. It is a fraction of the function of the Napier number e, which is the base of the natural logarithm as shown below. The shape of this function is hard to come up with, but it looks like the following.

`NN.ipynb`



import numpy as np
def sigmoid(x):
    return 1/(1+np.exp(-x))


xxx = np.arange(-5.0,5.0,0.1)#Show sigmoid function
yyy = sigmoid(xxx)
plt.plot(xxx,yyy)
plt.ylim(-0.1,1.1)
plt.show

It can be seen that when x> 0 with x = 0 as the boundary, it gradually approaches y = 1. On the contrary, when x <0, it is asymptotic to y = 0. It turns out that the point that the input value can be output between 0 and 1 is very convenient because it plays the role of the activation function.

Step function

Next, there is a step function as a function that outputs 0,1 to the extreme of the sigmoid function. This would be written as follows.

`NN.ipynb`


def step_function(x):
    return np.array(x > 0, dtype=np.int)

x = np.arange(-5.0, 5.0, 0.1)
y = step_function(x)
plt.plot(xxx,yyy)
plt.plot(x, y)
plt.ylim(-0.1, 1.1) 
plt.show()

Blue is the step function and orange is the sigmoid function. You can see that the output value is only 0 or 1. I have little sense of how to use this function properly, so let me leave it as my homework in the future. ** As a feeling, I understand that the sigmoid function has a function that can distinguish the difference even with a slight input difference because it can take the value more finely. On the other hand, if there are multiple layers and the calculation load is high, I think that it is possible to make a distinction while reducing the load by using the step function as appropriate. ** **

Non-linear function (ReLU function)

Finally, I would like to talk about the ReLU (Rectified Linear Unit) function, which also has the impression that it is often used. If x exceeds 0, the value is output as y as it is, and if it is 0 or less, 0 is output.

`NN.ipynb`


def relu(x):
    return np.maximum(0,x)

xx = np.arange(-5.0,5.0,0.1)
yy = relu(xx)
plt.plot(xx,yy)
plt.ylim(-0.1,5)
plt.show

What is forward propagation type?

This time, we will create a forward propagation type neural network. This forward propagation type indicates that the flow flows from the input to the output in one direction. When thinking about training a model, the calculation is performed from the output to the input. This is called the backpropagation method.

Implement a 3-layer neural network

Now, I would like to actually describe a three-layer neural network.

Consider creating a three-layer neural network as shown in the figure above. First, let's take out only the calculations that stand out in bold in the above figure.

`NN.ipynb`


def init_network():
    network = {}
    network['W1'] = np.array([[0.1,0.3,0.5],[0.2,0.4,0.6]])
    network['b1'] = np.array([0.1,0.2,0.3])
    return network

def forword(network,x):
    W1= network['W1']
    b1= network['b1']
    
    a1 = np.dot(x,W1)+b1
    z1 = sigmoid(a1)
    
    return y

network = init_network()
x = np.array([2,1])
z1 = forword(network,x)
print(z1)

[0.40442364 0.59557636]

I let the init_network () function define weights and biases, and let the forword () function define the formulas that actually calculate. After that, the function is called and the initial value x is assigned so that the answer can be spit out. It's easier to understand than writing a function without defining it in a row.

** Also, pay attention to the function that represents the inner product of the matrix described here as np.dot. Be careful when describing the product of matrices, because the dimensions of the matrix obtained in the order of multiplication change. ** **

3-layer neural network

`NN.ipynb`


def init_network():
    network = {}
    network['W1'] = np.array([[0.1,0.3,0.5],[0.2,0.4,0.6]])
    network['b1'] = np.array([0.1,0.2,0.3])
    network['W2'] = np.array([[0.1,0.4],[0.2,0.5],[0.3,0.6]])
    network['b2'] = np.array([0.1,0.2])
    network['W3'] = np.array([[0.1,0.3],[0.2,0.4]])
    network['b3'] = np.array([0.1,0.2])
    return network

def forword(network,x):
    W1,W2,W3 = network['W1'],network['W2'],network['W3']
    b1,b2,b3 = network['b1'],network['b2'],network['b3']
    
    a1 = np.dot(x,W1)+b1
    z1 = sigmoid(a1)
    a2 = np.dot(z1,W2)+b2
    z2 = sigmoid(a2)
    a3 = np.dot(z2,W3)+b3
    y = softmax(a3)
    
    return y

If you let the two functions describe to the end, it will be written like this. Well, I mentioned earlier here, but there is a description written as softmax at the end. This is summarized below.

Identity function and softmax function

After that, we can see that we should add layers to these two functions. Then, consider the value y to be output at the end. When it is necessary to classify problems such as guessing 0 to 9 types of numbers, the probability corresponding to each type is output, and the one with the highest probability is used as the predicted value. A convenient function to express such a probability is the softmax function.

The sum of the values taken for all items in a certain classification can be used as the denominator, and the individual values taken as the numerator can be used as the value representing the probability. By ending with this softmax function, the classification problem is reduced to the probability, and the highest value is the predicted value. 　

In terms of implementation, since it is an exponential function of exp, there is a problem that the value is very easy to diverge. Therefore, it seems that it is often done for convenience that it becomes difficult to diverge by multiplying a certain constant by the denominator and numerator and inserting it into the exponent of exp.

`NN.ipynb`



def softmax(a):
    c = np.max(a)
    exp_a = np.exp(a-c)
    sum_exp_a = np.sum(exp_a)
    y = exp_a/sum_exp_a
    return y

Enter the initial conditions and output the answer

`NN.ipynb`


network = init_network()
x = np.array([2,1])
y = forword(network,x)
print(y)

[0.40442364 0.59557636]

As a test, I put an appropriate value in x, and the answer was returned as follows. A value indicating that y1 has a probability of 40% and y2 has a probability of 60% was output. After that, I understand that complicated classifications will become possible as the input matrix becomes larger and the layers become deeper (≈ increase).

At the end

This time, I made a very basic neural network by hand. I deepened my understanding by just moving my hands. ** I finally came to understand the basics of the GAN algorithm that I copied and moved. ** By putting ideas such as model learning and convolution here, it will lead to a convolutional neural network and further to GAN. It may be an introduction to reach the latest technology, but I hope that by steadily deepening my understanding in this way, I will surely improve my technical capabilities.

The full program is here. It is divided into a file that you just played with the function and a file that is a 3-layer neural network. https://github.com/Fumio-eisan/neuralnetwork_20200318

I made my own 3-layer forward propagation neural network and tried to understand the calculation deeply.

Introduction

Understand the perceptron

Check with the AND circuit

NN.ipynb

python

Expand to neural network

How to count the layers of a neural network

Understand bias

Understand the activation function

NN.ipynb

NN.ipynb

NN.ipynb

What is forward propagation type?

Implement a 3-layer neural network

NN.ipynb

3-layer neural network

NN.ipynb

Identity function and softmax function

NN.ipynb

Enter the initial conditions and output the answer

NN.ipynb

At the end

`NN.ipynb`

`python`

`NN.ipynb`

`NN.ipynb`

`NN.ipynb`

`NN.ipynb`

`NN.ipynb`

`NN.ipynb`

`NN.ipynb`