Overview

I was studying deep learning and tried to make an output including the establishment of knowledge, so I wrote it as an article. The full code is listed on GitHub. This time, we implemented VGG16, which is famous for CNN models, using Keras, a deep learning framework that makes it easy to build models, and identified images of CIFAR10.

Implementation environment

Execution environment

Google Colaboratory

version

Python 3.6.9
TensorFlow 1.15.0
Keras 2.2.5

Library import

`import`


import numpy as np
import sys
%matplotlib inline
import matplotlib.pyplot as plt
import keras
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, BatchNormalization
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

First, import the required libraries. Keras (Official Document) is a high-level deep learning framework with TensorFlow etc. as the back end, which makes it easy to design and extend complex models. ..

Also, CIFAR10 is a color image dataset provided by the University of Toronto for airplanes, cars, birds, cats, deer, etc. 10 types of images of dogs, frogs, horses, boats, and trucks are stored in 32x32 pixels. CIFAR10 is provided by default in the keras.data package, similar to MNIST for handwritten digit data.

Data set preparation

`datasets`


'''Data set loading'''
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
'''Setting batch size, number of classes, number of epochs'''
batch_size=64
num_classes=10
epochs=20
'''one-hot vectorization'''
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
'''shape display'''
print("x_train : ", x_train.shape)
print("y_train : ", y_train.shape)
print("x_test : ", x_test.shape)
print("y_test : ", y_test.shape)

Next, load the training data and test data with load_data (). The batch size and the number of epochs are defined above. Also, the label data is one-hot vectorized (a vector with only one component being 1 and all others being 0) so that it can be handled by softmax. These shapes look like this:

`Output result`


x_train :  (50000, 32, 32, 3)
y_train :  (50000, 10)
x_test :  (10000, 32, 32, 3)
y_test :  (10000, 10)

The number of training data is 50,000 and the number of test data is 10000.

Implementation of VGG16 model

Then we will finally make a VGG16 model. The VGG series is explained in detail in this article. Roughly summarized, VGG16 is a CNN model created by the VGG team, a competition for object detection and image classification ILSVRC (IMAGENET Large Scale Visulal Recognition) Is it like a model that ranked high in Challenge)? Due to its relatively simple design and high performance, it is often mentioned in the introduction of deep learning. The origin of 16 seems to be that it consists of 16 layers in total. The structure of VGG16 is as shown in the figure below. (Quoted from Original Paper. VGG16 is Model D.)

There are 13 convolutional layers with a filter size of 3x3 and 3 fully connected layers. I implemented VGG16 with reference to the above figure.

`VGG16`


'''VGG16'''
input_shape=x_train.shape[1:]
model = Sequential()
model.add(Conv2D(filters=64, kernel_size=(3,3), strides=(1,1), padding='same', input_shape=input_shape, name='block1_conv1'))
model.add(BatchNormalization(name='bn1'))
model.add(Activation('relu'))
model.add(Conv2D(filters=64, kernel_size=(3,3), strides=(1,1), padding='same', name='block1_conv2'))
model.add(BatchNormalization(name='bn2'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same', name='block1_pool'))
model.add(Conv2D(filters=128, kernel_size=(3,3), strides=(1,1), padding='same', name='block2_conv1'))
model.add(BatchNormalization(name='bn3'))
model.add(Activation('relu'))
model.add(Conv2D(filters=128, kernel_size=(3,3), strides=(1,1), padding='same', name='block2_conv2'))
model.add(BatchNormalization(name='bn4'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same', name='block2_pool'))
model.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='same', name='block3_conv1'))
model.add(BatchNormalization(name='bn5'))
model.add(Activation('relu'))
model.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='same', name='block3_conv2'))
model.add(BatchNormalization(name='bn6'))
model.add(Activation('relu'))
model.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='same', name='block3_conv3'))
model.add(BatchNormalization(name='bn7'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same', name='block3_pool'))
model.add(Conv2D(filters=512, kernel_size=(3,3), strides=(1,1), padding='same', name='block4_conv1'))
model.add(BatchNormalization(name='bn8'))
model.add(Activation('relu'))
model.add(Conv2D(filters=512, kernel_size=(3,3), strides=(1,1), padding='same', name='block4_conv2'))
model.add(BatchNormalization(name='bn9'))
model.add(Activation('relu'))
model.add(Conv2D(filters=512, kernel_size=(3,3), strides=(1,1), padding='same', name='block4_conv3'))
model.add(BatchNormalization(name='bn10'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same', name='block4_pool'))
model.add(Conv2D(filters=512, kernel_size=(3,3), strides=(1,1), padding='same', name='block5_conv1'))
model.add(BatchNormalization(name='bn11'))
model.add(Activation('relu'))
model.add(Conv2D(filters=512, kernel_size=(3,3), strides=(1,1), padding='same', name='block5_conv2'))
model.add(BatchNormalization(name='bn12'))
model.add(Activation('relu'))
model.add(Conv2D(filters=512, kernel_size=(3,3), strides=(1,1), padding='same', name='block5_conv3'))
model.add(BatchNormalization(name='bn13'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same', name='block5_pool'))
model.add(Flatten(name='flatten'))
model.add(Dense(units=4096, activation='relu', name='fc1'))
model.add(Dense(units=4096, activation='relu', name='fc2'))
model.add(Dense(units=num_classes, activation='softmax', name='predictions'))
model.summary()

There are two types of Keras model construction methods, the Sequential model and the Functional API model, but this time I used the simpler Sequential model. Models are built in series by adding to the model as described above. Please note that the VGG model is originally intended for ILSVRC, so the input size and output size do not match this data. Therefore, the I / O size is changed as follows. This time I'm using the much simpler CIFAR10, so you may not actually need to use such a complex model.

	Change before	After change
Input size	224×224	32×32
Output size	1000	10

In addition, Batch Normalization is currently used as a method to prevent overfitting of training data, but it is not used because this method was not established when VGG was announced. This time, I also adopted that. The output result of the model is as follows.

`Output result`


Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
block1_conv1 (Conv2D)        (None, 32, 32, 64)        1792      
_________________________________________________________________
bn1 (BatchNormalization)     (None, 32, 32, 64)        256       
_________________________________________________________________
activation_1 (Activation)    (None, 32, 32, 64)        0         
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 32, 32, 64)        36928     
_________________________________________________________________
bn2 (BatchNormalization)     (None, 32, 32, 64)        256       
_________________________________________________________________
activation_2 (Activation)    (None, 32, 32, 64)        0         
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 16, 16, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 16, 16, 128)       73856     
_________________________________________________________________
bn3 (BatchNormalization)     (None, 16, 16, 128)       512       
_________________________________________________________________
activation_3 (Activation)    (None, 16, 16, 128)       0         
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 16, 16, 128)       147584    
_________________________________________________________________
bn4 (BatchNormalization)     (None, 16, 16, 128)       512       
_________________________________________________________________
activation_4 (Activation)    (None, 16, 16, 128)       0         
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 8, 8, 128)         0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 8, 8, 256)         295168    
_________________________________________________________________
bn5 (BatchNormalization)     (None, 8, 8, 256)         1024      
_________________________________________________________________
activation_5 (Activation)    (None, 8, 8, 256)         0         
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 8, 8, 256)         590080    
_________________________________________________________________
bn6 (BatchNormalization)     (None, 8, 8, 256)         1024      
_________________________________________________________________
activation_6 (Activation)    (None, 8, 8, 256)         0         
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 8, 8, 256)         590080    
_________________________________________________________________
bn7 (BatchNormalization)     (None, 8, 8, 256)         1024      
_________________________________________________________________
activation_7 (Activation)    (None, 8, 8, 256)         0         
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 4, 4, 256)         0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 4, 4, 512)         1180160   
_________________________________________________________________
bn8 (BatchNormalization)     (None, 4, 4, 512)         2048      
_________________________________________________________________
activation_8 (Activation)    (None, 4, 4, 512)         0         
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 4, 4, 512)         2359808   
_________________________________________________________________
bn9 (BatchNormalization)     (None, 4, 4, 512)         2048      
_________________________________________________________________
activation_9 (Activation)    (None, 4, 4, 512)         0         
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 4, 4, 512)         2359808   
_________________________________________________________________
bn10 (BatchNormalization)    (None, 4, 4, 512)         2048      
_________________________________________________________________
activation_10 (Activation)   (None, 4, 4, 512)         0         
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 2, 2, 512)         0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
bn11 (BatchNormalization)    (None, 2, 2, 512)         2048      
_________________________________________________________________
activation_11 (Activation)   (None, 2, 2, 512)         0         
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
bn12 (BatchNormalization)    (None, 2, 2, 512)         2048      
_________________________________________________________________
activation_12 (Activation)   (None, 2, 2, 512)         0         
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
bn13 (BatchNormalization)    (None, 2, 2, 512)         2048      
_________________________________________________________________
activation_13 (Activation)   (None, 2, 2, 512)         0         
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 1, 1, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 512)               0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              2101248   
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
predictions (Dense)          (None, 10)                40970     
=================================================================
Total params: 33,655,114
Trainable params: 33,646,666
Non-trainable params: 8,448
_________________________________________________________________

Model learning

We will learn the created model.

`Learning`


'''optimizer definition'''
optimizer=keras.optimizers.adam()
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
'''Data normalization'''
x_train=x_train.astype('float32')
x_train/=255
x_test=x_test.astype('float32')
x_test/=255
'''fit'''
history=model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_data=(x_test, y_test))

The optimization method used was the commonly used Adam. Since we will not tune the hyperparameters this time, we set the parameters to the default values. The loss function is the categorical cross entropy used in the multiclass classification problem expressed by Eq. (1).

\begin{equation} L =-\ sum_ {i = 1} ^ {N} y_i \ log {\ hat {y_i}} \ qquad (N: number of classes \ quad y_i: correct label \ quad \ hat {y_i}: predicted label) \ tag {1} \end{equation}

The metric to be optimized is the correct answer rate. (Specified by metrics) Set these by model.compile. Finally, you can normalize the image data and train it with model.fit. Log learning in history. With the above settings, the learning results are as follows.

`Execution result`


Train on 50000 samples, validate on 10000 samples
Epoch 1/20
50000/50000 [==============================] - 38s 755us/step - loss: 2.0505 - acc: 0.1912 - val_loss: 2.1730 - val_acc: 0.2345
Epoch 2/20
50000/50000 [==============================] - 33s 667us/step - loss: 1.5810 - acc: 0.3763 - val_loss: 1.8167 - val_acc: 0.3522
Epoch 3/20
50000/50000 [==============================] - 33s 663us/step - loss: 1.2352 - acc: 0.5354 - val_loss: 1.4491 - val_acc: 0.5108
Epoch 4/20
50000/50000 [==============================] - 34s 674us/step - loss: 0.9415 - acc: 0.6714 - val_loss: 1.1408 - val_acc: 0.6202
Epoch 5/20
50000/50000 [==============================] - 34s 670us/step - loss: 0.7780 - acc: 0.7347 - val_loss: 0.8930 - val_acc: 0.6974
Epoch 6/20
50000/50000 [==============================] - 34s 675us/step - loss: 0.6525 - acc: 0.7803 - val_loss: 0.9603 - val_acc: 0.6942
Epoch 7/20
50000/50000 [==============================] - 34s 673us/step - loss: 0.5637 - acc: 0.8129 - val_loss: 0.9188 - val_acc: 0.7184
Epoch 8/20
50000/50000 [==============================] - 34s 679us/step - loss: 0.4869 - acc: 0.8405 - val_loss: 1.0963 - val_acc: 0.7069
Epoch 9/20
50000/50000 [==============================] - 34s 677us/step - loss: 0.4268 - acc: 0.8594 - val_loss: 0.6283 - val_acc: 0.8064
Epoch 10/20
50000/50000 [==============================] - 33s 668us/step - loss: 0.3710 - acc: 0.8785 - val_loss: 0.6944 - val_acc: 0.7826
Epoch 11/20
50000/50000 [==============================] - 34s 670us/step - loss: 0.3498 - acc: 0.8871 - val_loss: 0.6534 - val_acc: 0.8024
Epoch 12/20
50000/50000 [==============================] - 33s 663us/step - loss: 0.2751 - acc: 0.9113 - val_loss: 0.6253 - val_acc: 0.8163
Epoch 13/20
50000/50000 [==============================] - 34s 670us/step - loss: 0.2388 - acc: 0.9225 - val_loss: 1.1404 - val_acc: 0.7384
Epoch 14/20
50000/50000 [==============================] - 33s 667us/step - loss: 0.2127 - acc: 0.9323 - val_loss: 0.9577 - val_acc: 0.7503
Epoch 15/20
50000/50000 [==============================] - 33s 667us/step - loss: 0.1790 - acc: 0.9421 - val_loss: 0.7820 - val_acc: 0.7915
Epoch 16/20
50000/50000 [==============================] - 33s 666us/step - loss: 0.1559 - acc: 0.9509 - val_loss: 0.7138 - val_acc: 0.8223
Epoch 17/20
50000/50000 [==============================] - 34s 671us/step - loss: 0.1361 - acc: 0.9570 - val_loss: 0.8909 - val_acc: 0.7814
Epoch 18/20
50000/50000 [==============================] - 33s 669us/step - loss: 0.1272 - acc: 0.9606 - val_loss: 0.7006 - val_acc: 0.8246
Epoch 19/20
50000/50000 [==============================] - 33s 666us/step - loss: 0.1130 - acc: 0.9647 - val_loss: 0.7523 - val_acc: 0.8177
Epoch 20/20
50000/50000 [==============================] - 34s 671us/step - loss: 0.0986 - acc: 0.9689 - val_loss: 0.7233 - val_acc: 0.8350

After completing 20 epochs, the correct answer rate was about 97% for training data and about 84% for test data. Let's plot the loss and correct answer rate for each epoch.

`Graph plot`


'''Visualization of results'''
plt.figure(figsize=(10,7))
plt.plot(history.history['acc'], color='b', linewidth=3)
plt.plot(history.history['val_acc'], color='r', linewidth=3)
plt.tick_params(labelsize=18)
plt.ylabel('acuuracy', fontsize=20)
plt.xlabel('epoch', fontsize=20)
plt.legend(['training', 'test'], loc='best', fontsize=20)
plt.figure(figsize=(10,7))
plt.plot(history.history['loss'], color='b', linewidth=3)
plt.plot(history.history['val_loss'], color='r', linewidth=3)
plt.tick_params(labelsize=18)
plt.ylabel('loss', fontsize=20)
plt.xlabel('epoch', fontsize=20)
plt.legend(['training', 'test'], loc='best', fontsize=20)
plt.show()

The transition of the correct answer rate is as shown in the figure below.

The transition of the loss function is as shown in the figure below.

Hmm ... The loss of test data has become unstable from around the 4th epoch. I did Batch Normalization, but it looks like Over tarining.

Data storage

This learning did not take much time, but the model trained for a long time can be saved and reused. Save the model and weights as shown below.

`Save model`


'''Data storage'''
model.save('cifar10-CNN.h5')
model.save_weights('cifar10-CNN-weights.h5')

Summary

This time it is a tutorial, so I used Keras to identify the image of CIFAR10 with the famous VGG16 model. Since VGG16 was originally a model used for 1000 class classification, I used Batch Normalization with different input / output sizes, but I overtrained it. The I / O size may be too small. In addition, as improvement methods, implementation of Dropout and L2 regularization and tuning of optimization methods can be considered.

References

For the implementation of this code, I referred to the following books.

--"Deep Learning Practical Techniques & Tuning Techniques by Keras" Masaki Aono, published by Morikita Publishing Co., Ltd., 2019

I implemented the VGG16 model in Keras and tried to identify CIFAR10

Overview

Implementation environment

Execution environment

version

Library import

import

Data set preparation

datasets

Output result

Implementation of VGG16 model

VGG16

Output result

Model learning

Learning

Execution result

Graph plot

Data storage

Save model

Summary

References

`import`

`datasets`

`Output result`

`VGG16`

`Output result`

`Learning`

`Execution result`

`Graph plot`

`Save model`