Introduction

As an exercise in "Learn from Mosaic Removal: State-of-the-art Deep Learning" written by koshian2, there was MNIST classification by multi-layer perceptron. This time, I would like to summarize it to deepen my understanding. https://qiita.com/koshian2/items/aefbe4b26a7a235b5a5e

The main points are as follows.

Classification of MNIST by multi-layer perceptron
Meaning of explanatory variables in image classification
Activation function in the middle layer
Difference in loss function depending on task

What is Multilayer Perceptron?

A multi-layer perceptron is a network that has a layer called the ** intermediate layer ** between the input layer and the output layer.

A method called least squares method is known for regression, and logistic regression is known for classification. These methods have the problem that even if the number of data is increased, the accuracy cannot be improved. In order to take advantage of this large amount of data, the multi-layer perceptron is a method of improving accuracy by inserting a layer called an intermediate layer between the input layer and the output layer.

About the contents of the program

Acquisition of MNIST data

`mnist.ipynb`



import tensorflow as tf
import tensorflow.keras.layers as layers

(X_train,y_train),(X_test,y_test)=tf.keras.datasets.mnist.load_data()
#X_train,y_train: Training data
#X_test, y_test: Test data

I read the training and test data directly from the keras dataset. I often use the train_test_split function to classify test data from the training data of Oomoto, but this time it's easy because it can be read without doing this.

Dimensional confirmation

`mnist.ipynb`



print(X_train.shape,y_train.shape)
print(X_test.shape,y_test.shape)

(60000, 28, 28) (60000,) (10000, 28, 28) (10000,)

You can see that there are 60,000 28x28 training image data and 10,000 28x28 test image data.

Definition of multi-layer perceptron model

`mnist.ipynb`



inputs = layers.Input((28,28))
x = layers.Flatten()(inputs)
x = layers.BatchNormalization()(x)
x = layers.Dense(128, activation='relu')(x)
x = layers.Dense(10, activation="softmax")(x)
outputs = x
model = tf.keras.models.Model(inputs, outputs)

What I mean by this description is as follows.

Defined as a 28x28 input layer
Level 28 × 28 to 784 dimensions
Convert 784 dimensions as 128 dimensional intermediate layers by ReLU function
Convert to 10 dimensions by softmax function as output layer

I thought it was very easy to understand because if you want to modify the layer, you only have to change one line.

Compiling the model

`mnist.ipynb`



model.compile('adam', 'sparse_categorical_crossentropy',['sparse_categorical_crossentropy'])

Here, the optimization method, loss function, and evaluation function are determined.

An optimization method is a method of finding the value of a parameter that makes the value of the loss function as small as possible. This time, we are applying the commonly used Adam method.

https://www.slideshare.net/MotokawaTetsuya/optimizer-93979393 https://qiita.com/ZoneTsuyoshi/items/8ef6fa1e154d176e25b8

The loss function uses Categorical Cross Entropy. This formula looks like this:

CCE(y_{true}, y_{pred})=-\frac{1}{N}\sum_{i=1}^N\sum_{j=1}^M y_{true}^{i, j}\log y_{pred}^{i, j}

Where $ N $ is the number of samples and $ M $ is the number of classes. MNIST predicts the probability of each class. Error functions handled by the so-called least squares method are suitable for predicting prices, but they are not accustomed to dealing with probabilities.

Finally, the evaluation function. This is a function that is useful for visualizing the progress of training that is not used for optimization.

Training / prediction program execution and its prediction results

`mnist.ipynb`


#Model training
model.fit(X_train,y_train, validation_data=(X_test, y_test),epochs=10)

#Model prediction
y_pred_prob= model.predict(X_test)
y_pred = np.argmax(y_pred_prob, axis=-1)

#Result output
fig = plt.figure(figsize=(14,14))
for i in range(100):
    ax = fig.add_subplot(10,10,i+1)
    ax.imshow(X_test[i],cmap="gray")
    ax.set_title(y_pred[i])
    ax.axis("off")

For model predictions, the predicted value is a probability. Therefore, take argmax (index value that maximizes the probability) to convert to label (0-9).

You can see that the prediction is working. The above method is the image classification of MNIST by the multi-layer perceptron model.

The multi-layered construction method and model compilation are still deep, and I would like to learn and experience other cases and papers.

The full code is posted here. https://github.com/Fumio-eisan/minist_mlp20200307

MNIST (handwritten digit) image classification with multi-layer perceptron

Introduction

What is Multilayer Perceptron?

About the contents of the program

Acquisition of MNIST data

mnist.ipynb

Dimensional confirmation

mnist.ipynb

Definition of multi-layer perceptron model

mnist.ipynb

Compiling the model

mnist.ipynb

Training / prediction program execution and its prediction results

mnist.ipynb

`mnist.ipynb`

`mnist.ipynb`

`mnist.ipynb`

`mnist.ipynb`

`mnist.ipynb`