MNIST (handwritten digit) image classification with multi-layer perceptron

Introduction

As an exercise in "Learn from Mosaic Removal: State-of-the-art Deep Learning" written by koshian2, there was MNIST classification by multi-layer perceptron. This time, I would like to summarize it to deepen my understanding. https://qiita.com/koshian2/items/aefbe4b26a7a235b5a5e

The main points are as follows.

What is Multilayer Perceptron?

A multi-layer perceptron is a network that has a layer called the ** intermediate layer ** between the input layer and the output layer.

002.png

A method called least squares method is known for regression, and logistic regression is known for classification. These methods have the problem that even if the number of data is increased, the accuracy cannot be improved. In order to take advantage of this large amount of data, the multi-layer perceptron is a method of improving accuracy by inserting a layer called an intermediate layer between the input layer and the output layer.

About the contents of the program

Acquisition of MNIST data

mnist.ipynb



import tensorflow as tf
import tensorflow.keras.layers as layers

(X_train,y_train),(X_test,y_test)=tf.keras.datasets.mnist.load_data()
#X_train,y_train: Training data
#X_test, y_test: Test data

I read the training and test data directly from the keras dataset. I often use the train_test_split function to classify test data from the training data of Oomoto, but this time it's easy because it can be read without doing this.

Dimensional confirmation

mnist.ipynb



print(X_train.shape,y_train.shape)
print(X_test.shape,y_test.shape)

(60000, 28, 28) (60000,) (10000, 28, 28) (10000,)

You can see that there are 60,000 28x28 training image data and 10,000 28x28 test image data.

Definition of multi-layer perceptron model

mnist.ipynb



inputs = layers.Input((28,28))
x = layers.Flatten()(inputs)
x = layers.BatchNormalization()(x)
x = layers.Dense(128, activation='relu')(x)
x = layers.Dense(10, activation="softmax")(x)
outputs = x
model = tf.keras.models.Model(inputs, outputs)

What I mean by this description is as follows.

  1. Defined as a 28x28 input layer
  2. Level 28 × 28 to 784 dimensions
  3. Convert 784 dimensions as 128 dimensional intermediate layers by ReLU function
  4. Convert to 10 dimensions by softmax function as output layer

I thought it was very easy to understand because if you want to modify the layer, you only have to change one line.

Compiling the model

mnist.ipynb



model.compile('adam', 'sparse_categorical_crossentropy',['sparse_categorical_crossentropy'])

Here, the optimization method, loss function, and evaluation function are determined.

An optimization method is a method of finding the value of a parameter that makes the value of the loss function as small as possible. This time, we are applying the commonly used Adam method.

https://www.slideshare.net/MotokawaTetsuya/optimizer-93979393 https://qiita.com/ZoneTsuyoshi/items/8ef6fa1e154d176e25b8

The loss function uses Categorical Cross Entropy. This formula looks like this:

CCE(y_{true}, y_{pred})=-\frac{1}{N}\sum_{i=1}^N\sum_{j=1}^M y_{true}^{i, j}\log y_{pred}^{i, j}

Where $ N $ is the number of samples and $ M $ is the number of classes. MNIST predicts the probability of each class. Error functions handled by the so-called least squares method are suitable for predicting prices, but they are not accustomed to dealing with probabilities.

Finally, the evaluation function. This is a function that is useful for visualizing the progress of training that is not used for optimization.

Training / prediction program execution and its prediction results

mnist.ipynb


#Model training
model.fit(X_train,y_train, validation_data=(X_test, y_test),epochs=10)

#Model prediction
y_pred_prob= model.predict(X_test)
y_pred = np.argmax(y_pred_prob, axis=-1)

#Result output
fig = plt.figure(figsize=(14,14))
for i in range(100):
    ax = fig.add_subplot(10,10,i+1)
    ax.imshow(X_test[i],cmap="gray")
    ax.set_title(y_pred[i])
    ax.axis("off")

For model predictions, the predicted value is a probability. Therefore, take argmax (index value that maximizes the probability) to convert to label (0-9).

003.png

You can see that the prediction is working. The above method is the image classification of MNIST by the multi-layer perceptron model.

The multi-layered construction method and model compilation are still deep, and I would like to learn and experience other cases and papers.

The full code is posted here. https://github.com/Fumio-eisan/minist_mlp20200307

Recommended Posts

MNIST (handwritten digit) image classification with multi-layer perceptron
Challenge image classification with TensorFlow2 + Keras 3 ~ Visualize MNIST data ~
Easy image classification with TensorFlow
Multilayer Perceptron with Chainer: Function Fitting
[Chainer] Learning XOR with multi-layer perceptron
Image classification with wide-angle fundus image dataset
Image classification with Keras-From preprocessing to classification test-
Make a logic circuit with a perceptron (multilayer perceptron)
Cooking object detection with yolo + image classification
Learn to recognize handwritten numbers (MNIST) with Caffe
Made TensorFlow's MNIST tutorial compatible with image input
Stock Price Forecast with TensorFlow (Multilayer Perceptron: MLP) ~ Stock Forecast Part 2 ~