Non-information graduate students studied machine learning from scratch # 3: MNIST Handwritten digit recognition

Introduction

A non-information graduate student studied machine learning from scratch. Write in an article to keep a record of what you have studied. I will decide how to proceed while doing it, but for the time being, I will gradually step up from the basics while tracing the famous "Deep-Learning made from scratch". The environment will be operated by Google Colab. Part 3 will perform MNIST handwritten digit recognition with a neural network.

table of contents

  1. What is the MNIST handwritten digit recognition problem?
  2. Implementation of neural network
  3. Batch processing

1. What is the MNIST handwritten digit recognition problem?

MNIST [^ 1] is a data set composed of handwritten numerical images from 0 to 9, and is it a machine learning Hello world! The MNIST image data is a 28x28 gray image, and each pixel is represented in 256 steps from 0 to 255. A corresponding correct label is given for each image data. Let's actually see it.

Get dataset


!git clone https://github.com/oreilly-japan/deep-learning-from-scratch

import sys, os
sys.path.append("/content/deep-learning-from-scratch")      #setting path
from dataset.mnist import load_mnist

(x_train, t_train), (x_test, t_test) = load_mnist(flatten=True, normalize=False)
#(Training image,Training label),(Test image,Test label)

print(x_train.shape)    #(60000, 784)
print(t_train.shape)    #(60000,)
print(x_test.shape)    #(10000, 784)
print(t_test.shape)    #(60000,)

Get the dataset as a preliminary preparation. This time, I will get it from github of "Deep-Learning made from scratch" which I refer to. From the shape of the data, we can see that there are 60,000 training images and 10,000 test images. Also, size 784 is a value of 28x28, and since flatten is True in the argument of the load_mnist () function, it means that the image of 1x28x28 is stored as a one-dimensional array. Let's display the MNIST image after confirmation.

Image data display


img = x_train[0].reshape(28,28)    #Convert to 28x28
plt.imshow(img)
plt.show()
print(t_train[0])    #5

MNIST.png Looking at the 0th image of the dataset, I saw a handwritten "5" -like number. If you refer to the 0th training label to see if it is really 5, it will still be output as "5", so you can confirm that it matches.

2. Implementation of neural network

Now that we know the identity of the dataset, we will implement a neural network that recognizes handwritten numbers.

MNIST Neural network for handwritten digit recognition


#Get MNIST dataset
def get_data():
    (x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, flatten=True, one_hot_label=False)
    return x_test, t_test

#Read the weight and bias of the neural network
def init_network():
    with open("/content/deep-learning-from-scratch/ch03/sample_weight.pkl", 'rb') as f:
        network = pickle.load(f)
    return network

#Inference processing function
def predict(network, x):
    W1, W2, W3 = network['W1'], network['W2'], network['W3']
    b1, b2, b3 = network['b1'], network['b2'], network['b3']

    a1 = np.dot(x, W1) + b1
    z1 = sigmoid(a1)
    a2 = np.dot(z1, W2) + b2
    z2 = sigmoid(a2)
    a3 = np.dot(z2, W3) + b3
    y = softmax(a3)

    return y

The get_data () function is the same as the one described above, and the predict () function is the same as the neural network structure introduced last time. The weight and bias of each layer of the neural network are defined by init_network (), but since the learned parameters are used here, the given values ​​are read. The learned parameters are 10 for the input layer 28 × 28 = 784 and 10 classifications for the output layer 0-9, and the number of neurons is 50 and 100 in a 3-layer neural network with 2 hidden layers.

Inference processing


#Definition
x, t = get_data()
network = init_network()
accuracy_cnt = 0

#Inference processing for each image with a for statement
for i in range(len(x)):
    y = predict(network, x[i])
    p= np.argmax(y) #Get the index of the most probable element
    if p == t[i]:
        accuracy_cnt += 1

print("Accuracy:" + str(float(accuracy_cnt) / len(x)))    #Accuracy:0.9352

#Matrix size confirmation
W1, W2, W3 = network['W1'], network['W2'], network['W3']
print(x.shape)       #(10000, 784)
print(x[0].shape)    #(784,)
print(W1.shape)      #(784, 50)
print(W2.shape)      #(50, 100)
print(W3.shape)      #(100, 10)

Inference processing is performed on each test image by turning the for statement. Since the output of the predict () function is the probability of being recognized by each number that has passed through Softmax, the one with the largest value is acquired as the recognition result. Compare the result with the test label to calculate the accuracy. Classification based on the given trained parameters gave a recognition accuracy of 93.52%. It was also confirmed that the number of neurons was calculated as described above.

3. Batch processing

Generally, in numerical calculation by a computer, it is easier to calculate by making a group larger, such as making a matrix operation rather than turning the for statement around. In the previous code, the for statement was circulated around, but the neural network is also changed so that inference is performed for each unit of a certain size. This is called batch processing.

Inference processing using batch processing


batch_size = 100
accuracy_cnt = 0
for i in range(0, len(x), batch_size):
    x_batch = x[i:i+batch_size]     #Get i-th batch data
    y_batch = predict(network, x_batch)
    p= np.argmax(y_batch, axis=1)       #index of max value
    accuracy_cnt += np.sum(p == t[i:i+batch_size])

print("Accuracy:" + str(float(accuracy_cnt) / len(x)))    #Accuracy:0.9352

In the above program, the for loop is reduced to len (x)/batch_size times by performing the predict () function of inference processing one by one, not by x_batch one by one. In other words, when you were calculating one by one (1, 784)→(784, 50)→(50, 100)→(100, 10)→(1, 10) And the size was changing (100, 784)→(784, 50)→(50, 100)→(100, 10)→(100, 10) It has 100 inputs and 100 outputs.

It is difficult to write a network like the one I wrote in Part 2 last time to check the structure of the neural network, so it usually seems to be abbreviated like this. structure.png

Next time, we will introduce the steepest descent method in preparation for learning with a neural network.

References

Deep-Learning from scratch Deep-Learning GitHub from scratch Deep Learning (Machine Learning Professional Series)

[^ 1]: Abbreviation for Mixed National Institute of Standards and Technology database

Recommended Posts

Non-information graduate students studied machine learning from scratch # 3: MNIST Handwritten digit recognition
Non-information graduate student studied machine learning from scratch # 1: Perceptron
Non-information graduate student studied machine learning from scratch # 2: Neural network
Machine learning starting from scratch (machine learning learned with Kaggle)
Machine learning starting from 0 for theoretical physics students # 2
Study method for learning machine learning from scratch (March 2020 version)
Create a machine learning environment from scratch with Winsows 10
Deep Learning from scratch 1-3 chapters