Introduction

In the continuation of deep learning 1, mnist handwriting recognition will be performed. See the previous article for the basic structure of deep learning.

Implementation

Data download and visualization

This time, we will download a data set called mnist, which is open to the public for machine learning, to train and test the model. You can also label the image you actually have and load it. First, let's download and visualize the downloaded data.

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
a = np.arange(100)
sns.heatmap(a.reshape((10,10)))

By creating a heatmap, we were able to easily visualize the array. Now, download the data for handwriting recognition from mnist and visualize it.

from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Now you have the training images (60000, 28, 28) in train_images. This means 60,000 28pixelx28pixel shade images. It is a black and white image depending on the output method. Let's take a look at the image inside.

sns.heatmap(train_images[1210])
print(train_labels[1210])

Now you can see that there is a 5 inside the image and label like 5.

Now let's create the data to learn from these raw data. In multi-class classification like this time, the input can be left as it is, but if the output is a number on the label, the accuracy will drop considerably, because it is a situation where 7 or 9 is troubled and 8 is output. It can be. Therefore, the output is as many as the number of labels, and what is output is the probability that the input is that label. Therefore, the following preprocessing is performed.

train_x = train_images.reshape(60000,28,28,1)
train_y = np.zeros((60000,10))
test_x = test_images.reshape((10000,28,28,1))
test_y = np.zeros((10000,10))
for i in range(60000):
  train_y[i][train_labels[i]]=1.0
for i in range(10000):
  test_y[i][test_labels[i]]=1.0

The input / output format is now complete. By the way, test_images contains 10000 sheets of data.

Model formation

from keras import layers
from keras import models
from keras import optimizers
model = models.Sequential()
model.add(layers.Conv2D(16,(3,3),padding="same",activation="relu",input_shape = (28,28,1)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(32,(3,3),padding="same",activation="relu"))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Flatten())
model.add(layers.Dense(128,activation = "relu"))
model.add(layers.Dense(128,activation = "relu"))
model.add(layers.Dense(10,activation = "softmax"))
model.compile(loss = "categorical_crossentropy",optimizer="adam",metrics=["accuracy"])

A layer that was not used last time has appeared. layers.Conv2D.

Convolution layer

When processing an image in the field of image processing, the effect of blurring can be obtained by changing its own pixel value to the average of the surrounding pixel values. There are many other processes that update the value of all pixels by pixels in the vicinity. It's easy to do that with the kernel. https://deepage.net/deep_learning/2016/11/07/convolutional_neural_network.html This site explains the convolution layer in an easy-to-understand manner, but here again, it is like copying an image to the next paper using a special dropper for a large image, and the dropper sucks color. Sometimes some of the colors around me slip together. Then, transfer the weighted sum of the sushi to the next paper (at this time, it does not affect the surrounding colors). The kernel represents the weight of the dropper. Then, since it is difficult to color the edge, it is called zero padding to border the image with 0 and then perform this work, and padding = "same" in the code is bordered with zero so as not to change the image size. It means to take it. Here, it can be seen that by increasing the number of droppers, an image having a different effect on the image can be obtained. The first argument of Conv2D corresponds to how many images to increase. The next argument is the size of the kernel.

pooling I think there is something written as max pooling. This is a type of pooling and is a method for making images smaller. The image is made smaller by taking 2x2pixel as 1pixel and taking the maximum value in 2x2. This makes it easier to handle the huge dimensional input of images.

softmax This is the activation function that first appeared this time, but in multiclass classification, the sum of the last 10-dimensional vectors should be 1 because it is the probability of each label that should be output. Softmax does this well.

categorical_crossentropy This is to use the cross entropy suitable for learning in the range of 0 to 1 instead of performing the loss by the difference of the output. The loss given when 1 is judged to be 0.1 is larger than the loss when 1 is judged to be 0.9, which is suitable for such a classification problem.

Training

history = model.fit(train_x,train_y,steps_per_epoch=10,epochs = 10)

You can see that you are training now. Even if you do nothing, the log will come out and you can see that the accuracy has increased to 95% or more at the end of learning without training data.

By the way, the return value of model.fit is stored in history, but you can use this to plot the state of learning. For example, if you want to see the transition of the correct answer rate

plt.plot(history.history['accuracy'])

Now we can visualize the learning process. Read the keras document and play around with the code to understand it.

Verification

No matter how much the training data produces results, it is meaningless unless it can be used outside the training data. There was a way to do it at the same time as training, but this time we will verify the model after learning.

model.evaluate(test_x,test_y)

[Loss, correct answer rate]. It can be seen that similar results are obtained with the test data.

At the end

This time, we implemented the so-called convolutional neural network in the simplest possible form. However, since the image size is 28x28, in fact, all can be learned without problems even in the fully connected layer, so it may be interesting to implement that model and see the results. If it is fully combined, it will use computer materials for $ O (n ^ 4) $, so it can not handle large data (100x100 is probably too much?), But convolution has variables only for the kernel, so it works even with 1024x1024 without problems. (just in time).

Next time I will deal with generative models. For the time being, implement a normal GAN with code that is as easy to understand as possible.

Deep learning learned by implementation 2 (image classification)