This is a study memo (9th bullet) about image classification (Google Colaboratory environment) using TensorFlow2 + Keras. The subject is the classification of handwritten digit images (MNIST), which is a standard item.
--Challenge image classification by TensorFlow2 + Keras series -1. Move for the time being -2. Take a closer look at the input data -3. Visualize MNIST data -4. Let's make a prediction with the trained model -5. Observe images that fail to classify -6. Try preprocessing and classifying images prepared by yourself -7. Understanding layer types and activation functions -8. Select optimization algorithm and loss function -9. Try learning, saving and loading the model
Last time is at model.compile (...)
of the tutorial program (** optimization algorithm ** and ** loss function * * Setting) was taken up.
This time, I would like to understand the model learning model.fit (...)
settings (number of epochs, number of batches, data settings for validation, etc.). We will also deal with saving and loading trained models.
Training is done with the fit ()
method (fit in the sense that the model fits the training data).
--See here for a reference of model.fit ()
In the tutorial "Introduction to TensorFlow 2.0 for Beginners", give the following three arguments for training. I am doing it.
model.fit(x_train, y_train, epochs=5)
The first argument is the ** input data Numpy array ** (batch) for training, the second argument is the training ** correct data Numpy array ** (batch), and the third argument ʻepochs` is "input data". It is ** epoch number ** (integer value) indicating the number of repetitions of learning, which counts "the entire Numpy array of" as one unit.
The larger the number of epochs (ʻepochs`), the more suitable the model is for training data, but the more likely it is to overfit and the longer it will take to learn.
At the end of each epoch, a log of evaluations for training is output as follows.
Train on 60000 samples Epoch 1/5 60000/60000 [==//====] - 5s 83us/sample - loss: 0.2978 - accuracy: 0.9138 Epoch 2/5 60000/60000 [==//====] - 4s 75us/sample - loss: 0.1454 - accuracy: 0.9566
The above loss: 0.2978
is the result of evaluating the loss function value (loss) for the training data (using the model at the time of completion of the epoch), and ʻaccuracy: 0.9138is the result of evaluating the accuracy rate. Basically,
loss and ʻaccuracy
improve as the number of epochs increases.
By monitoring the loss
and ʻaccuracy` output above, it is not possible to determine ** whether overfitting has occurred (generalization performance has been lost) **.
Therefore, part of the training data can be separated as variation data (validation, validation), and evaluation can be performed each time the epoch is completed (** The data separated for variation is used for training at all. It will not be**).
Specifically, you can specify the data percentage (validation_split
) to use for variations as follows:
model.fit(x_train, y_train, validation_split=0.2, epochs=5)
The run-time log looks like this:
Train on 48000 samples, validate on 12000 samples Epoch 1/5 48000/48000 [==============================] - 4s 92us/sample - loss: 0.3273 - accuracy: 0.9036 - val_loss: 0.1515 - val_accuracy: 0.9572 Epoch 2/5 48000/48000 [==============================] - 4s 85us/sample - loss: 0.1619 - accuracy: 0.9521 - val_loss: 0.1233 - val_accuracy: 0.9657
When validation_split
was not specified, it was" Train on 60000 samples
", but you can see that this has changed to " Train on 48000 samples, validate on 12000 samples
". This means that ** 12,000, which is 20% of the 60,000 training data, are allocated for verification **.
You can also see that the model evaluations val_loss
and val_accuracy
using validation data are also output for each epoch. As the epoch progresses, if both loss
and val_loss
go down, it means that you are learning smoothly. On the other hand, if loss
goes down but val_loss
goes up, it is considered to be overfitting.
The validation_split
option used a certain percentage of the training data as validation data, but you can also assign separately prepared data to the validation data. In the previous and last time, in order to investigate the effect of hyperparameters, we assigned the test data as it is as validation data as follows.
model.fit(x_train, y_train, validation_data=(x_test,y_test), epochs=5)
The batch_size
option allows you to specify a ** batch size for training **. If omitted, it will be the same as when batch_size = 32` is specified.
In training, we use a method called ** mini-batch learning **. For example, even if you give 60,000 training data, you will not learn using all of them at once. The 32 data specified by the batch size are randomly extracted from 60,000 items and learned (weight update), then 32 items are randomly extracted from 59,968 items and learned (weight update), and so on. The smaller the batch size, the more often the weights are updated and the longer the calculation time.
Generally, it seems that 32, 64, 128, etc. are adopted for the batch size.
model.fit(x_train, y_train, batch_size=64, epochs=5)
fit ()
Running fit ()
will give you a History object as a return value. In History.history
, you can get a list of loss
, val_loss
, ʻaccuracy,
val_accuracy` for each epoch as follows. Last time, two times before, I drew a graph using this data.
tmp = model.fit(x_train, y_train, validation_split=0.2, epochs=5)
print(tmp.history)
{'loss': [0.32421727414925894, 0.15691129270382226, 0.11842467025667429, 0.09661550550659498, 0.07930808525610095], 'accuracy': [0.9054792, 0.95329165, 0.9648542, 0.97077084, 0.9762083], 'val_loss': [0.15114309609557192, 0.1147864429069062, 0.09423549160702775, 0.09074506457825192, 0.08207530307924996], 'val_accuracy': [0.95933336, 0.967, 0.97216666, 0.97291666, 0.97525]}
For a small NN model like the one in the tutorial, training doesn't take much time. However, if the model size is large or the number of training epochs is set large, the training execution time will be very long.
Therefore, it is convenient to save the trained model, which is the training result, to a file so that it can be recalled as needed.
Especially in Google Colab. ** If the idle state (the state where the cell is not running) continues for 90 minutes, the instance will be dropped **, and when reconnecting, ** the memory will be cleared ** (variable variable). The contents will be lost).
Therefore, "execute training"-> "leave the seat because it takes time"-> "training is completed (trained model is completed)"-> "90 minutes idle continues"-> "instance is dropped and trained model disappears" "I'm going to go" → "** I'm shocked to go back to my seat **" (the same thing happens even if the PC goes to sleep).
To avoid this, it is safe to put the "Save Model" code immediately after training.
Normally, pickle is used to serialize and save variable contents (objects). However, when I try to save the model with pickle, it ** fails ** like TypeError: can't pickle _thread.RLock objects
.
Therefore, use the dedicated model.save (...)
to save the model. Just specify the file path as an argument.
Save trained model
model.save('model-01.h5') #Save to temporary area
# model.save('/content/model-01.h5') #Save to temporary area with absolute path
In the Google Colab. Environment, we recommend that you mount Google Drive and save it there. For details on how to mount Google Drive, refer to the latter half of here.
If you try to save the model before compiling it, you will get a warning like WARNING: tensorflow: No training configuration found in save file: the model was * not * compiled. Compile it manually.
.
Load the saved model as follows.
Loading a trained model
import tensorflow as tf
model = tf.keras.models.load_model('model-01.h5')
-How to determine batch size, number of iterations, and number of epochs in machine learning / deep learning -Touch Chainer on Google Colaboratory vol.5 ~ Understand the number of epochs and batch size ~
Recommended Posts