Let's try ** classification of handwritten digit images (MNIST) ** with ** TensorFlow2 + Keras ** in Google Colaboratory environment (+ deepen understanding of Python and deep learning). Last time has sample code from Official HP Tutorial of TensorFlow. I came to the point where I actually tried it.
--Challenge image classification by TensorFlow2 + Keras series -1. Move for the time being -2. Take a closer look at the input data -3. Visualize MNIST data -4. Let's make a prediction with the trained model -5. Observe images that fail to classify -6. Try preprocessing and classifying images prepared by yourself -7. Understanding layer types and activation functions -8. Select optimization algorithm and loss function -9. Try learning, saving and loading the model
According to "Illustrated Rapid Learning DEEP LEARNING (Author: Tomoaki Masuda)", ** MNIST ** has the following origins. Although not directly related here, the raw data is available at http://yann.lecun.com/exdb/mnist/.
One of the NISTs (National Institute of Standards and Technology databases) had a dataset with numbers handwritten by US Census Bureau staff and high school students. "M" NIST is a modified version of it that is easier to use with machine learning.
This time, we will explain the contents of ** training data ** (x_train
, y_train
) and ** test data ** (x_test
, y_test
) in the sample code shown last time. Take a closer look or use matplotlib to visualize it.
First of all, I will organize "** multi-class classification problem " and " deep learning **" (confirm the positioning of training data and test data).
Handwritten digit recognition belongs to the ** multiclass classification problem **. The multi-class classification problem is the problem of predicting the ** category (class) of the input data **. The category is given in advance ** like "dog" "cat" "bird" in the question setting, and it is "dog" "cat" "bird" for the input data (for example, image). Of these, the problem is to find out which category it belongs to.
Various approaches have been proposed for the multi-class classification problem, but here we will solve it using ** deep learning ** (deep learning).
Deep learning belongs to a technique called ** supervised machine learning **. Supervised machine learning is roughly composed of ** 2 stages ** called "learning phase" and "prediction phase (inference phase, application phase)".
First, in the ** learning phase , a large number of pairs of ** input data ** and ** correct answer data ** (= teacher data, correct answer data, correct answer value, correct answer label) are given to the model. Let them learn their relationships. The pair set of these input data and correct answer data is called ** training data ** (= learning data). The model trained using the training data is called " trained model **".
In the subsequent ** prediction phase **, ** unknown input data ** is given to the trained model to ** predict the output ** (Predict). For multi-class problems, the category (for example, "dog") is the predictive output.
Then, the process of ** Evaluate ** is to measure "how much performance the trained model has". In the evaluation, first, ** input data and correct answer data different from those used for training are prepared, and of these, only the input data ** is given to the trained model to obtain prediction data. Then, the obtained prediction data is answered using the correct answer data, scored, and used as the evaluation value. As specific evaluation indexes, in addition to the ** correct answer rate ** (accuracy) and ** loss function value ** (loss) that appeared last time, various items such as the precision rate and recall rate are available as needed. Will be adopted.
The following code downloads the MNIST data and stores it in each variable (x_train
, y_train
, x_test
, y_test
) (the whole program is [previous](https://qiita. See com / code0327 / items / 7d3c7bd3327ff049243a)).
python
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
Here, * _train
is the input & correct answer data assigned for training (for learning), and * _test
is the input & correct answer data assigned for testing (for model evaluation). There are 60,000 for training and 10,000 for testing.
In addition, x _ ***
is the input data (that is, data representing a handwritten image: 256-step grayscale of 28x28), and y _ ***
is the correct answer data (category from "0" to "9"). Is stored in an array.
First, let's check with len ()
that each of them is actually composed of 60,000 and 10,000 data.
python
#Training data
print(len(x_train)) #Execution result-> 60000
print(len(y_train)) #Execution result-> 60000
#Test data
print(len(x_test)) #Execution result-> 10000
print(len(y_test)) #Execution result-> 10000
Next, let's check the ** type ** of each data.
python
print(type(x_train)) #Execution result-> <class 'numpy.ndarray'>
print(type(y_train)) #Execution result-> <class 'numpy.ndarray'>
print(type(x_test)) #Execution result-> <class 'numpy.ndarray'>
print(type(y_test)) #Execution result-> <class 'numpy.ndarray'>
Next, let's check the contents of y_train
(= correct answer data for training).
python
print(y_train) #Execution result-> [5 0 4 ... 5 6 8]
It was found that the correct answer value of the 0th data is "5", the correct answer value of the 1st data is "0" ..., and the correct answer value of the 59,999th data is "8".
Next, let's check the contents of x_train
(= representing a handwritten image for training). Since it would be ridiculous to display all items, only the first x_train [0]
is targeted.
python
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print(x_train[0].shape) #Execution result-> (28, 28)
print(x_train[0]) #Execution result->See below
You can check the size of the data in numpy.ndarray with .shape
. (28, 28)
, which means that x_train [0]
is composed of ** 28 rows and 28 columns two-dimensional array **. Also, the output of print (x_train [0])
looks like this:
If you look at it with a light eye, you can see the slightly distorted handwritten number "5". This matches the "5" stored in y_train [0]
.
You can see that each pixel data is composed of ** values in the range ** 0 to 255, where 0 is the background (white) and 255 is the darkest text (black).
I would like to check it for all 60,000 data.
python
import numpy as np
print(x_train.min()) #Extract the minimum value#Execution result-> 0
print(x_train.max()) #Extract maximum value#Execution result-> 255
You can see that all the data consists of the range 0-255.
By the way, how many numbers from "0" to "9" exist in the 60,000 training data? Basically, I think that 10 patterns from 0 to 9 exist almost evenly, but let's check. Use pandas for aggregation.
pandas version
import pandas as pd
tmp = pd.DataFrame({'label':y_train})
tmp = tmp.groupby(by='label').size()
display(tmp)
print(f'Total number={tmp.sum()}')
Execution result
label
0 5923
1 6742
2 5958
3 6131
4 5842
5 5421
6 5918
7 6265
8 5851
9 5949
dtype: int64
Total number=60000
There seems to be some variation, such as less "5" and more "1".
You can find it without using pandas as follows.
numpy version
import numpy as np
tmp = list([np.count_nonzero(y_train==p) for p in range(10)])
print(tmp) #Execution result-> [5923, 6742, 5958, 6131, 5842, 5421, 5918, 6265, 5851, 5949]
print(f'Total number={sum(tmp)}') #Execution result-> Total number=60000
--I wanted to go as far as displaying the input data graphically using matplotlib, but the article has become long, so I'd like to do that next time.
Recommended Posts