neural network

It's a very different concept from Chapter 2 Perceptron.

Terminology

--Input layer --Output layer --Intermediate layer (hidden layer)

From the input layer to the output layer, we will call them the 0th layer, the 1st layer, and the 2nd layer in order.

From the review of Chapter 2, it can be expressed by the following formula

y = h(b+w1x1+w2x2)

h(x) = 0 (x <= 0)
       1 (x > 0)

a = b+w1x1+w2x2
h(a) = 0 (x <= 0)
       1 (x > 0)

Activation function

Step function implementation

h(x) = 1 (x >= 0)
       0 (x < 0)

When trying to express a step function as an expression

Since it cannot be used like step_function (np.array ([1.0, 2.0])),

import numpy as np
x = np.array([-1.0, 1.0, 2.0])
y = x > 0
y
array([False, True, True], dtype=bool)
y = y.astype(np.int)
y
array([0, 1, 1])

Summarizing the above into a function

def step_function(x):
    return np.array(x > 0, dtype= np.int)

Can be represented by.

Implementation of sigmoid function

h(x) = 1 / 1 + exp(-x)

def sigmoid(x):
    return 1/ (1 + np.exp(-x))

x = np.array([-1.0, 1.0, 2.0])
sigmoid(x)

> array([0.2689, 0.73.., 0.88..])

The perceptron has only 0,1 signals, but the NN has continuous signals. Do not use linear functions for the activation function. Using linear algebra makes it meaningless to deepen layers in neural networks.

Because it becomes a network without hidden layers, it is not possible to take advantage of multi-layering

ReLU function

Although sigmoid is common, a function called ReLU (Rectified Linear Unit) is mainly used. The ReLU function can be expressed as a mathematical expression as follows.

h(x) = x (x > 0)
       0 (x <= 0)

Implementation

def relu(x):
    return np.maximum(0, x)

Multidimensional array calculation

Inner product of matrix

A = np.array([[1,2],[3,4]])
B = np.array([[5,6],[7,8]])
np.dot(A, B)

>> array([[19, 22],[43, 50]])

A = np.array([1,2,3],[4,5,6])
B = np.array([[1,2],[3,4],[5,6]])

np.dot(A,B)

>> array([[22,28],[49,64]])

Note that an error will occur if the dimensions are not matched.

Inner product of neural network

X = np.array([1,2])
W = np.array([1,3,5],[2,4,6])

Y = np.dot(X,W)
print(Y)
>>> [5,11,17]

Implementation of 3-layer neural network

To express a three-layer neural network in a simple formula

A = XW = B

W1.shape = (2, 3)
W2.shape = (3, 2)
W3.shape = (2, 2)

A1 = np.dot(X, W1) + B1
Z1 = sigmoid(A1)

A2 = np.dot(Z1, W2) + B2
Z2 = sigmoid(A2)

A3 = np.dot(Z2, W3) + B3

Implementation summary

def init_network():
    network = {}
    network['W1'] = np.array([[0.1, 0.3, 0.5],[0.2,0.4,0.6]])
    network['b1'] = np.array([0.1, 0.2, 0.3])
    network['W2'] = np.array([[0.1, 0.4],[0.2, 0.5],[0.3,0.6]])
    network['b2'] = np.array([0.1, 0.2])
    network['W3'] = np.array([[0.1, 0.3],[0.2,0.4]])
    network['b3'] = np.array([0.1, 0.2])

    return network

def forward(network, x)
    W1, W2, W3 = network['W1'], network['W2'], network['W3']
    b1, b2, b3 = network['b1'], network['b2'], network['b3']

    a1 = np.dot(x, W1) + b1
    z1 = sigmoid(a1)
    a2 = np.dot(z1, W2) + b2
    z2 = sigmoid(a2)
    a3 = np.dot(z2, W3) + b3
    y = identity_function(a3)

    return y

network = init_network()
x = np.array([1.0, 0.5])
y = forward(network, x)
print(y) # [0.3168.., 0.69....]

Output layer design

Identity function and softmax function

The identity function outputs the input as is. The softmax function is expressed by the following formula

yk = exp(ak) / nΣi=1 exp(ai)

a = np.array([0.3, 2.9, 4.0])
exp_a = np.exp(a)
sum_exp_a = np.sum(exp_a)
y = exp_a = sum_exp_a

Notes on implementing softmax functions

You have to be careful about overflow.

def softmax(a):
    c = np.max(a)
    exp_a = np.exp(a-c) #Overflow measures
    sum_exp_a = np.sum(exp_a)
    y = exp_a / sum_exp_a

    return y

Number of neurons in the output layer

For 10-class classification problems, set the number of output layers to 10.

Handwritten digit recognition

MNIST data set

MNIST

Image set of handwritten numbers. One of the most famous datasets. It consists of numerical images from 0 to 9.
60,000 training images and 10,000 test images are available, and these images are used for learning and inference.
A common use of the MNIST dataset is to train with training images and measure how well the trained model can classify the test images.

Implementation

import sys, os
sys.path.append(os.pardir)
from dataset.mnist import load_mnist

(x_train, t_train), (x_test, t_test) = \
    load_mnist(flatten=True, normalize=False)

print(x_train.shape) # (60000, 784)
print(t_train.shape) # (60000,)
print(x_test.shape) # (10000, 784)
print(t_test.shape) # (10000,)

Returns the read MNIST data in the format (training image, training label), (test image, test label).

import sys, os
sys.path.append(os.pardir)
import numpy as np
from datase.mnist import load_mnist
from PIL import Image

def img_show(img):
    pil_img = IMage.fromarray(np.unit8(img))
    pil_img.show()

(x_train, t_train), (x_test, t_test) = \
    load_mnist(flatten = True, normalize = False)

img = x_train[0]
label = t_train[0]
print(label) # 5

print(img.shape) # (784)
img = img.reshape(28, 28)
print(img.shape) # (28, 28)

img_show(img) #5 images are displayed

The image read as flatten = True is stored in one dimension as a Numpy array. When displayed, it must be reshaped to a size of 28 x 28.

Neural network inference processing

Since it is classified into 10 numbers, there are 10 output layers. It is also assumed that there are two hidden layers, the first hidden layer has 50 neurons and the second layer has 100 neurons. The numbers 50 and 100 can be set to any value. First, we define three functions.

get_data()
init_network()
predict()

def get_date():
    (x_train, t_train), (x_test, t_test) = \
        load_mnist(normalize=True, flatten=True, one_hot_label=False)
    return x_test, t_test

def init_network():
    with open("sample_weight.pkl", 'rb') as f:
        network = pickle.load(f)
    return network

def predict(network, x):
    W1, W2, W3 = network['W1'], network['W2'], network['W3']
    b1, b2, b3 = network['b1'], network['b2'], network['b3']

    a1 = np.dot(x, W1) + b1
    z1 = sigmoid(a1)
    a2 = np.dot(z1, W2) + b2
    z2 = sigmoid(a2)
    a3 = np.dot(z2, W3) + b3
    y = softmax(a3)
    return y

x, t = get_date()
network = init_network()

accuracy_cnt = 0
for i in range(len(x)):
    y = predict(network, x[i])
    p = np.armax(y) #Get index of the most established element
    if p == t[i]:
        accurancy_cnt += 1
print("accurancy:" + str(float(accuracy_cnt) / len(x)))

Load the trained weight parameters stored in sample_weight.pkl. This file contains the weight and bias parameters. pkl will be explained in the next chapter.

Chapter 3 Neural Network Cut out only the good points of deep learning made from scratch

neural network

Terminology

Activation function

Step function implementation

Implementation of sigmoid function

ReLU function

Multidimensional array calculation

Inner product of matrix

Inner product of neural network

Implementation of 3-layer neural network

Output layer design

Identity function and softmax function

Notes on implementing softmax functions

Number of neurons in the output layer

Handwritten digit recognition

MNIST data set

Implementation

Neural network inference processing