Try to build a deep learning / neural network with scratch

1.First of all

This time, in order to understand forward propagation and error back propagation, I will implement a neural network that actually works with scratch. The dataset uses MNIST.

2. Neural network specifications

A neural network with 28 x 28 = 784 input layers, 10 intermediate layers, and 2 output layers is used, the activation function / error function is sigmoid, and the optimization method is gradient descent.

The data set is obtained by extracting "1" and "7" from MNIST and performing binary classification. スクリーンショット 2020-03-16 09.00.03.png

3. Forward propagation

First, when you calculate $ a ^ 0_0 $, there are 784 inputs from $ x ^ 0_0 $ to $ x ^ 0_ {783} $, each with weights $ w ^ 0_ {00} $ to $ w ^ 0_ Since {783,0} $ is hung, スクリーンショット 2020-03-16 09.12.07.png

Expressed as a matrix, all calculations from $ a ^ 0_0 $ to $ a ^ 0_9 $ can be easily represented. スクリーンショット 2020-03-16 09.19.28.png

Since $ x ^ 1_0 $ to $ x ^ 1_9 $ is the result of passing $ a ^ 0_0 $ to $ a ^ 0_9 $ through the activation function sigmoid, スクリーンショット 2020-03-16 09.21.38.png

Next, when you calculate $ a ^ 1_0 $, there are 10 inputs from $ x ^ 1_0 $ to $ x ^ 1_9 $, each with weights $ w ^ 1_ {00} $ to $ w ^ 1_ {90. } $ Is hung, so スクリーンショット 2020-03-16 09.25.48.png

As before, if $ a ^ 1_0 $ and $ a ^ 1_1 $ are represented by a matrix, スクリーンショット 2020-03-16 18.33.38.png

Finally, $ y ^ 0 $ and $ y ^ 1 $ are スクリーンショット 2020-03-16 19.29.15.png

In this way, forward propagation can be easily performed by inner product or addition of matrices.

4. Error back propagation (intermediate layer to output layer)

First, update the weights and biases from the middle layer to the output layer.

The update expression for the weight w can be represented by $ w = w- \ eta * \ frac {\ partial E} {\ partial w} $. Here, $ \ eta $ is the learning rate, and $ \ frac {\ partial E} {\ partial w} $ is the error E differentiated by the weight w.

Let's give a concrete example of $ \ frac {\ partial E} {\ partial w} $ and express it with a general formula to implement it. First, the weights from the middle layer to the output layer.

スクリーンショット 2020-03-15 19.10.21.png Find $ \ frac {\ partial E ^ 0} {\ partial w ^ 1_ {00}} $ to update the weight $ w ^ 1_ {00} $. From the chain rule of differentiation スクリーンショット 2020-03-15 19.14.30.png Expressed as a general formula, k = 0 to 9, j = 0 to 1, スクリーンショット 2020-03-16 08.44.27.png This allows the weight $ w ^ 1_ {kj} $ to be updated. As for the bias, $ b ^ 1 $ is 1, so $ x ^ 1_k $ in the above formula just replaces 1. スクリーンショット 2020-03-16 10.02.58.png This allows the bias $ b ^ 1_j $ to be updated.

5. Error back propagation (from input layer to intermediate layer)

Next is the update of weights and biases from the input layer to the middle layer.

スクリーンショット 2020-03-16 10.15.30.png

To update the weight $ w ^ 0_ {00} $, $ \ frac {\ partial E ^ 0} {\ partial w ^ 0_ {00}} $ and $ \ frac {\ partial E ^ 1} {\ You need to find partial w ^ 0_ {00}} $. From the chain rule of differentiation スクリーンショット 2020-03-16 08.32.17.png Expressed as a general formula, k = 0 to 783, j = 0 to 9, スクリーンショット 2020-03-16 08.33.56.png This allows the weight $ w ^ 0_ {kj} $ to be updated. As for the bias, $ b ^ 0 $ is 1, so $ x ^ 0_k $ in the above formula just replaces 1. スクリーンショット 2020-03-16 10.11.34.png This allows the bias $ b ^ 0_j $ to be updated.

6. Implementation of forward propagation and error back propagation part

Based on the general formula obtained earlier, implement the forward propagation and error back propagation parts.

#Sigmoid function
def sigmoid(a):
    return 1 / (1 + np.exp(-a))

#Differentiation of sigmoid function
def sigmoid_d(a):
    return (1 - sigmoid(a)) * sigmoid(a)

#Backpropagation of error
def back(l, j):
    if l == max_layer - 1:
        return (y[j] - t[j]) * sigmoid_d(A[l][j])
    else:
        output = 0
        m = A[l+1].shape[0]   
        for i in range(m):
            output += back(l+1, i) * W[l+1][i,j] * sigmoid_d(A[l][j])
        return output

The specific movement of def back (l, j): is

When l = 1, (y [j] -t [j]) * sigmoid_d (A [1] [j]) is returned.

When l = 0,  (y[0]-t[0])*sigmoid_d(A[1][0])*W[1][0,j]*sigmoid_d(A[0][j])  +(y[1]-t[1])*sigmoid_d(A[1][1])*W[1][1,j]*sigmoid_d(A[0][j]) Is returned.

#Weight W setting
np.random.seed(seed=7)
w0 = np.random.normal(0.0, 1.0, (10, 784))
w1 = np.random.normal(0.0, 1.0, (2, 10))
W = [w0, w1]

#Bias b setting
b0 = np.ones((10, 1))
b1 = np.ones((2, 1))
B = [b0, b1]

#Other settings
max_layer = 2 #Setting the number of layers
n = 0.5  #Learning rate setting

Set the weight W, bias b, and other settings.

スクリーンショット 2020-03-16 19.48.21.png Each term of the weight matrix w0 and w1 is a random number that follows a normal distribution of 0 to 1 so that learning can start smoothly. By the way, if you change the seed = number of np.random.seed (seed = 7), the starting condition of learning (whether it starts smoothly or is a little sluggish) will change. Each term of the bias matrices b0 and b1 is 1.

#Learning loop
count = 0 
acc = []

for x, t in zip(xs, ts):
    
    #Forward propagation
    x0 = x.flatten().reshape(784, 1)
    a0 = W[0].dot(x0) + B[0]
    x1 = sigmoid(a0)
    a1 = W[1].dot(x1) + B[1]
    y = sigmoid(a1)

    #X for parameter update,List a
    X = [x0, x1]
    A = [a0, a1]

    #Parameter update
    for l in range(len(X)):
        for j in range(W[l].shape[0]):
            for k in range(W[l].shape[1]):
                W[l][j, k] = W[l][j, k] - n * back(l, j) * X[l][k]  
            B[l][j] = B[l][j] - n * back(l, j) 

It's a learning loop. Forward propagation can be easily performed by inner product of matrices, addition, etc. In parameter update,

When l = 0, the range is j = 0 to 9, k = 0 to 783.  W[0][j,k] = W[0][j,k] - n * back(0,j) * X[0][k]  B[0][j] = B[0][j] - n * back(0,j) Will be updated.

When l = 1, the range is j = 0 to 1, k = 0 to 9.  W[1][j,k] = W[1][j,k] - n * back(0,j) * X[0][k]  B[1][j] = B[1][j] - n * back(0,j) Will be updated.

7. Dataset preparation

Read the MNIST dataset with Keras and extract only "1" and "7".

import numpy as np
from keras.datasets import mnist
from keras.utils import np_utils
import matplotlib.pyplot as plt

#Numeric display
def show_mnist(x):
    fig = plt.figure(figsize=(7, 7))   
    for i in range(100):
        ax = fig.add_subplot(10, 10, i+1, xticks=[], yticks=[])
        ax.imshow(x[i].reshape((28, 28)), cmap='gray')
    plt.show()

#Data set reading
(x_train, y_train), (x_test, y_test) = mnist.load_data()
show_mnist(x_train)

# 1,Extract 7
x_data, y_data = [], []
for i in range(len(x_train)):  
    if y_train[i] == 1 or y_train[i] == 7:
       x_data.append(x_train[i])
       if y_train[i] == 1:
          y_data.append(0)
       if y_train[i] == 7:
          y_data.append(1)

show_mnist(x_data)

#Convert from list format to numpy format
x_data = np.array(x_data)
y_data = np.array(y_data)

# x_data normalization, y_One-hot representation of data
x_data = x_data.astype('float32')/255
y_data = np_utils.to_categorical(y_data)

#Learn, get test data
xs = x_data[0:200]
ts = y_data[0:200]  
xt = x_data[2000:3000]  
tt = y_data[2000:3000] 

スクリーンショット 2020-03-16 14.42.10.png スクリーンショット 2020-03-16 14.42.25.png The data from 0 to 9 and the data from which 1 and 7 are extracted are displayed from the beginning.

Prepare 200 pieces of training data xs and ts, and 1,000 pieces of test data xt and tt.

8. Whole implementation

It is the whole implementation that adds accuracy confirmation by test data and accuracy transition graph display for each learning.

import numpy as np
from keras.datasets import mnist
from keras.utils import np_utils

#Data set reading
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# 1,Extract only the number 7
x_data, y_data = [], []
for i in range(len(x_train)):  
    if y_train[i] == 1 or y_train[i] == 7:
       x_data.append(x_train[i])
       if y_train[i] == 1:
          y_data.append(0)
       if y_train[i] == 7:
          y_data.append(1)

#Convert from list format to numpy format
x_data = np.array(x_data)
y_data = np.array(y_data)

# x_data normalization, y_One-hot data
x_data = x_data.astype('float32')/255
y_data = np_utils.to_categorical(y_data)

#Acquisition of training data and test data
xs = x_data[0:200]  
ts = y_data[0:200]  
xt = x_data[2000:3000]  
tt = y_data[2000:3000]  


#Sigmoid function
def sigmoid(a):
    return 1 / (1 + np.exp(-a))

#Differentiation of sigmoid function
def sigmoid_d(a):
    return (1 - sigmoid(a)) * sigmoid(a)

#Backpropagation of error
def back(l, j):
    if l == max_layer - 1:
        return (y[j] - t[j]) * sigmoid_d(A[l][j])
    else:
        output = 0
        m = A[l+1].shape[0]   
        for i in range(m):
            output += back(l + 1, i) * W[l + 1][i, j] * sigmoid_d(A[l][j])
        return output

#Weight W setting
np.random.seed(seed=7)
w0 = np.random.normal(0.0, 1.0, (10, 784))
w1 = np.random.normal(0.0, 1.0, (2, 10))
W = [w0, w1]

#Bias b setting
b0 = np.ones((10, 1))
b1 = np.ones((2, 1))
B = [b0, b1]

#Other settings
max_layer = 2 #Setting the number of layers
n = 0.5  #Learning rate setting

#Learning loop
count = 0 
acc = []

for x, t in zip(xs, ts):
    
    #Forward propagation
    x0 = x.flatten().reshape(784, 1)
    a0 = W[0].dot(x0) + B[0]
    x1 = sigmoid(a0)
    a1 = W[1].dot(x1) + B[1]
    y = sigmoid(a1)

    #X for parameter update,List a
    X = [x0, x1]
    A = [a0, a1]

    #Parameter update
    for l in range(len(X)):
        for j in range(W[l].shape[0]):
            for k in range(W[l].shape[1]):
                W[l][j, k] = W[l][j, k] - n * back(l, j) * X[l][k]  
            B[l][j] = B[l][j] - n * back(l, j) 
            
    #Accuracy check by test data
    correct, error = 0, 0

    for i in range(1000):

        #Inference with learned parameters
        x0 = xt[i].flatten().reshape(784, 1)
        a0 = W[0].dot(x0) + B[0]
        x1 = sigmoid(a0)
        a1 = W[1].dot(x1) + B[1]
        y = sigmoid(a1)
    
        if np.argmax(y) == np.argmax(tt[i]):
           correct += 1
        else:
           error += 1
    calc = correct/(correct+error)
    acc.append(calc)
    count +=1
    print("\r[%s] acc: %s"%(count, calc))
   
#Accuracy transition graph display
import matplotlib.pyplot as plt
plt.plot(acc, label='acc')
plt.legend()
plt.show()   

スクリーンショット 2020-03-16 17.21.46.png In 200 steps, the classification accuracy was 97.8%. It would be great if the neural network that I implemented by scratch works properly.

Recommended Posts

Try to build a deep learning / neural network with scratch
Reinforcement learning 10 Try using a trained neural network.
[Deep Learning from scratch] About the layers required to implement backpropagation processing in a neural network
Try deep learning with TensorFlow
Try Deep Learning with FPGA
Try Deep Learning with FPGA-Select Cucumbers
[Evangelion] Try to automatically generate Asuka-like lines with Deep Learning
"Deep Learning from scratch" Self-study memo (No. 16) I tried to build SimpleConvNet with Keras
Try deep learning with TensorFlow Part 2
(Now) Build a GPU Deep Learning environment with GeForce GTX 960
"Deep Learning from scratch" Self-study memo (No. 17) I tried to build DeepConvNet with Keras
[Deep learning] Image classification with convolutional neural network [DW day 4]
I tried to divide with a deep learning language model
Lua version Deep Learning from scratch Part 6 [Neural network inference processing]
Introduction to Deep Learning (2) --Try your own nonlinear regression with Chainer-
Compose with a neural network! Run Magenta
Deep learning / Deep learning from scratch 2-Try moving GRU
Try Bitcoin Price Forecasting with Deep Learning
Try with Chainer Deep Q Learning --Launch
Try deep learning of genomics with Kipoi
Build a classifier with a handwriting recognition rate of 99.2% with a TensorFlow convolutional neural network
PPLM: A simple deep learning technique to generate sentences with specified attributes
Build a "Deep learning from scratch" learning environment on Cloud9 (jupyter miniconda python3)
Experiment with various optimization algorithms with a neural network
Deep Learning from scratch ① Chapter 6 "Techniques related to learning"
Try to draw a life curve with python
Try to make a "cryptanalysis" cipher with Python
Try to make a dihedral group with Python
Build a Python machine learning environment with a container
Python vs Ruby "Deep Learning from scratch" Chapter 3 Implementation of 3-layer neural network
Build a python environment to learn the theory and implementation of deep learning
I tried to extract a line art from an image with Deep Learning
[Deep Learning from scratch] Initial value of neural network weight using sigmoid function
Deep Learning from scratch
A story about predicting exchange rates with Deep Learning
Train MNIST data with a neural network in PyTorch
Try to make a command standby tool with python
Try to dynamically create a Checkbutton with Python's Tkinter
I want to climb a mountain with reinforcement learning
Try to predict forex (FX) with non-deep machine learning
[GCP] Try a sample to authenticate users with Firebase
Build a machine learning application development environment with Python
A sample to try Factorization Machines quickly with fastFM
Machine learning beginners try to make a decision tree
[Deep Learning] Execute SONY neural network console from CUI
Basics of PyTorch (2) -How to make a neural network-
Learning Deep Forest, a new learning device comparable to DNN
[Deep Learning from scratch] I tried to explain Dropout
Create a machine learning environment from scratch with Winsows 10
Dare to learn with Ruby "Deep Learning from scratch" Importing pickle files from forbidden PyCall
Build a "bot that tells you AV actresses with similar faces" by deep learning
Steps to quickly create a deep learning environment on Mac with TensorFlow and OpenCV
[Deep Learning from scratch] Initial value of neural network weight when using Relu function
I tried to implement a basic Recurrent Neural Network model
Implement a 3-layer neural network
Create a web application that recognizes numbers with a neural network
Try to factorial with recursion
Deep Learning from scratch 1-3 chapters
Python sample to learn XOR with genetic algorithm with neural network
Build a machine learning scikit-learn environment with VirtualBox and Ubuntu
[Deep learning] Investigating how to use each function of the convolutional neural network [DW day 3]