This is a study memo (first) about image classification (Google Colaboratory environment) using TensorFlow2 + Keras. The subject is the classification of handwritten digit images (** MNIST **), which is a standard item.
--Challenge image classification by TensorFlow2 + Keras series -1. Move for the time being -2. Take a closer look at the input data -3. Visualize MNIST data -4. Let's make a prediction with the trained model -5. Observe images that fail to classify -6. Try preprocessing and classifying images prepared by yourself -7. Understanding layer types and activation functions -8. Select optimization algorithm and loss function -9. Try learning, saving and loading the model
Specifically, for the following ** images ** (28x28pixel) that capture handwritten characters from "0" to "9" Which of ** "0" to "9" can each image be classified? The content is to approach the problem (= multi-class classification problem) with deep learning (deep learning) by TensorFlow2 + Keras.
For the development and execution environment, use Google Colabo., Which is easy, convenient, and free. For the introduction of Google Colabo., Please refer to here.
In this article, I copied the sample code from TensorFlow's Official HP and pasted it into the code cell of Google Colab. Make sure you can do it without.
On top of that, "** what each part of the code is doing " and " what the text displayed at runtime conveys **" are loosely and vaguely explained.
--Read "tensor flow" or "tensor flow". --A machine learning library developed by Google that allows you to build and train (= learn / train) neural networks (NN). Of course, you can also make predictions using the trained NN model. --1.0 was released in February 2017, and 2.0 was released in October 2019. --In TF2.0, Keras (described later) was integrated to increase the affinity with Python, making it easier to use and more sophisticated (he said). GPU support has also been strengthened (he said). --Development is continuing to keep up with the latecomer machine learning library forces such as PyTorch.
Keras
--Read "Kerasu". --High-level API that supports TensorFlow as well as Theano. Rapper. --Written in Python. --By using TF via Keras, machine learning can be realized with simple and short code.
Classification of handwritten digit image datasets (MINIST) in "Introduction to TensorFlow 2.0 for Beginners" on the official TensorFlow website. There is sample code (only a dozen lines) (classified into categories from "0" to "9"). Paste this into Google Colab. And run it.
To use TensorFlow2, execute the following ** magic command ** in the code cell (paste it in the code cell and execute it with \ [Ctrl ] + \ [Enter ]). The reason for doing this is that as of December 27, 2019, Google Colab. Has set TensorFlow ** 1.x ** as the default, and to switch it to ** 2.x ** It is the processing of.
GoogleColab.Preparation at
%tensorflow_version 2.x
If there is no problem, it will be displayed as " TensorFlow 2.x selected.
".
If you execute TF (TensorFlow) of 1.x, the message " The default version of TensorFlow in Colab will soon switch to TensorFlow 2.x. </ Font>" will appear, so it's close. I don't think this procedure will be needed anymore (TF 2.x will be the default).
I have added a few comments to the sample code on the official website.
import tensorflow as tf
# (1)Download the handwritten digit image dataset (MNIST) and store it in a variable
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# (2)Data normalization (preprocessing for input data)
x_train, x_test = x_train / 255.0, x_test / 255.0
# (3)Building an NN model
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
# (4)Compiling the model (including settings related to learning)
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])
# (5)Model training (learning / training)
model.fit(x_train, y_train, epochs=5)
# (6)Model evaluation
model.evaluate(x_test, y_test, verbose=2)
In the above short program, we do the following:
--Download the handwritten digit image dataset and store it in each variable (data preparation)
--* _ train
: Data for training (for learning, training)
--* _ test
: Test (evaluation) data
For details of these data, see the second" ~ Let's take a closer look at the input data ~. / items / 2c969ca4675d5a3691ef) ”.
--Data normalization (preprocessing for input data)
--Convert integer values in the range 0-255 to real numbers in the range 0.0-1.0
--Construction of neural network model for machine learning
--Details here will be explained in Part 7 "~ Understanding layer types and activation functions ~".
--Compile model (including settings related to learning)
--Details here will be explained in Part 8 "~ Selecting an optimization algorithm and loss function ~".
--Model training using training data (* _ train
) (→ ** Trained model ** completed)
--Model evaluation using test data (* _ test
) (execution of image classification by trained model and answer matching (scoring))
The execution result of the program is as follows.
Execution result
Train on 60000 samples
Epoch 1/5
60000/60000 [==============================] - 5s 82us/sample - loss: 0.2992 - accuracy: 0.9134
Epoch 2/5
60000/60000 [==============================] - 5s 78us/sample - loss: 0.1457 - accuracy: 0.9561
Epoch 3/5
60000/60000 [==============================] - 5s 78us/sample - loss: 0.1096 - accuracy: 0.9659
Epoch 4/5
60000/60000 [==============================] - 5s 78us/sample - loss: 0.0876 - accuracy: 0.9730
Epoch 5/5
60000/60000 [==============================] - 5s 80us/sample - loss: 0.0757 - accuracy: 0.9765
10000/10000 - 0s - loss: 0.0766 - accuracy: 0.9762
[0.07658648554566316, 0.9762]
In terms of meaning ...
--Train on 60000 samples
: We will train using 60,000 handwritten text images.
―― ʻEpoch x / 5: This is the xth learning out of 5 times in total. --
5s 82us / sample --loss: 0.2992 --accuracy: 0.9134: It took 82 $ \ mu $ seconds per image, and about 5 seconds for the whole (60,000 images). The performance of the model trained in this way (evaluated using training data) was 0.2992 for the loss function value (loss) and 0.9134 for the correct answer rate (accuracy). --The correct answer rate of 0.9134 means that $ 60,000 \ times0.9134 = 54,804 $ images can be correctly classified from 0 to 9, and the remaining $ 60,000-54,804 = 5,196 $ images are misclassified. Interpret. --
10000/10000 --0s --loss: 0.0766 --accuracy: 0.9762`: I tested the classification prediction with 10,000 images for testing (separate from the one used for training). The test took 0 seconds, with a loss function evaluation value (loss) of 0.0766 and a correct answer rate (accuracy) of 0.9762.
Also called "accuracy" or "correct answer rate". Represents the percentage of images that have been correctly classified. For example, if 98 out of 100 images can be classified correctly, the correct answer rate will be $ 98/100 = 0.98 $ (= 98%).
The percentage of correct answers ranges from 0.0 to 1.0, and the larger the ** value (closer to 1.0), the better the model ** (when evaluated using data not used for training).
There is a part where the superiority or inferiority of the model (classifier) cannot be measured only from the viewpoint of the correct answer rate. For example, suppose you want to classify (predict) one image (correct answer is "3") using two different models as follows.
For this image, model A predicts "3" and model B also predicts "3". Since the correct answer is "3", the correct answer rate is 1.0 ** for both models. Looking only at this ** correct answer rate index **, the two models are equally good.
However, the prediction of model A is "** 8 is 10%, the confidence of 3 is 90%, and 3 is selected **", while the prediction of model B is "*". * 8 is 45%, 3 is 55%, and 3 is output ** "What if?
** Even with the same correct answer rate of 1.0 **, it can be said that Model A is superior.
However, this cannot be taken into consideration in the correct answer rate index. The one to evaluate it is the ** loss function **, and the value evaluated by the loss function is ** loss **.
The handwritten digit classification dealt with here belongs to the type "** multiclass classification problem **", and the loss function of this problem usually uses an index called ** cross entropy error ** (cross entropy). I will. The cross entropy is calculated using each value in the output layer of the neural network and the correct answer data). Details are explained in Part 8 "~ Selecting an optimization algorithm and loss function ~".
Basically, the loss function value takes a value of 0.0 or more, and ** the smaller the loss function value (closer to 0.0), the better the model **. The loss function value can exceed 1.0.
――Next time, I would like to explain the training data (x_train
, y_train
) and test data (x_test
, y_test
) and visualize them using matplotlib.
Recommended Posts