As for what it is, if you use GoogleNet or VGG16, the performance of object recognition will be good, but it is difficult to use mobile phones because they do not have much memory and calculation speed. As one of the solutions to these problems, Google seems to have created a network MobileNet [^ 1] that can take a trade-off between calculation time, memory and performance, so I investigated it.
--The size of the network is small --Learning time is short --Somewhat good performance --It is okay if the image size of the input data is 32 or more. --Hyperparameters $ \ alpha $ are provided, which is a trade-off between computational complexity and performance. --Implemented with Keras [^ 3] --Keras prepares 16 patterns of trained models in ImageNet with image sizes of 224, 192, 160, 128 and $ \ alpha $ of 1.0, 0.75, 0.5, 0.25.
--The amount of calculation is reduced by combining the Depthwise convolution filter and the 1x1 convolution filter instead of the conventional convolution filter.
--Conventionally, a convolution filter is prepared for the number of channels (output) by kernel size x kernel size x number of channels (input), but convolution is performed.
--In MobileNet, a convolution filter with kernel size x kernel size x1 is prepared for the number of channels (input) and convolution is performed. --Next, prepare a 1x1x channel (input) convolution filter for the number of channels (output) and convolve. ――This realizes processing similar to conventional convolution.
Quoted from MobileNets [^ 1]
Structure when $ \ alpha = 0.5 $ in CIFAR10.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 32, 32, 3) 0
_________________________________________________________________
conv1 (Conv2D) (None, 16, 16, 16) 432
_________________________________________________________________
conv1_bn (BatchNormalization (None, 16, 16, 16) 64
_________________________________________________________________
conv1_relu (Activation) (None, 16, 16, 16) 0
_________________________________________________________________
conv_dw_1 (DepthwiseConv2D) (None, 16, 16, 16) 144
_________________________________________________________________
conv_dw_1_bn (BatchNormaliza (None, 16, 16, 16) 64
_________________________________________________________________
conv_dw_1_relu (Activation) (None, 16, 16, 16) 0
_________________________________________________________________
conv_pw_1 (Conv2D) (None, 16, 16, 32) 512
_________________________________________________________________
conv_pw_1_bn (BatchNormaliza (None, 16, 16, 32) 128
_________________________________________________________________
conv_pw_1_relu (Activation) (None, 16, 16, 32) 0
_________________________________________________________________
conv_dw_2 (DepthwiseConv2D) (None, 8, 8, 32) 288
_________________________________________________________________
conv_dw_2_bn (BatchNormaliza (None, 8, 8, 32) 128
_________________________________________________________________
conv_dw_2_relu (Activation) (None, 8, 8, 32) 0
_________________________________________________________________
conv_pw_2 (Conv2D) (None, 8, 8, 64) 2048
_________________________________________________________________
conv_pw_2_bn (BatchNormaliza (None, 8, 8, 64) 256
_________________________________________________________________
conv_pw_2_relu (Activation) (None, 8, 8, 64) 0
_________________________________________________________________
...
_________________________________________________________________
conv_dw_13 (DepthwiseConv2D) (None, 1, 1, 512) 4608
_________________________________________________________________
conv_dw_13_bn (BatchNormaliz (None, 1, 1, 512) 2048
_________________________________________________________________
conv_dw_13_relu (Activation) (None, 1, 1, 512) 0
_________________________________________________________________
conv_pw_13 (Conv2D) (None, 1, 1, 512) 262144
_________________________________________________________________
conv_pw_13_bn (BatchNormaliz (None, 1, 1, 512) 2048
_________________________________________________________________
conv_pw_13_relu (Activation) (None, 1, 1, 512) 0
_________________________________________________________________
global_average_pooling2d_1 ( (None, 512) 0
_________________________________________________________________
reshape_1 (Reshape) (None, 1, 1, 512) 0
_________________________________________________________________
dropout (Dropout) (None, 1, 1, 512) 0
_________________________________________________________________
conv_preds (Conv2D) (None, 1, 1, 10) 5130
_________________________________________________________________
act_softmax (Activation) (None, 1, 1, 10) 0
_________________________________________________________________
reshape_2 (Reshape) (None, 10) 0
=================================================================
Total params: 834,666
Trainable params: 823,722
Non-trainable params: 10,944
_________________________________________________________________
-$ \ alpha = 1.0 $ and val_acc = 87% -$ \ alpha = 0.5 $ and val_acc = 81% It was a place like that.
import keras
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.applications import MobileNet
batch_size = 32
classes = 10
epochs = 200
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
Y_train = keras.utils.to_categorical(y_train, classes)
Y_test = keras.utils.to_categorical(y_test, classes)
img_input = keras.layers.Input(shape=(32, 32, 3))
model = MobileNet(input_tensor=img_input, alpha=0.5, weights=None, classes=classes)
model.compile(loss='categorical_crossentropy', optimizer="nadam", metrics=['accuracy'])
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=0, # randomly rotate images in the range (degrees, 0 to 180)
width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=True, # randomly flip images
vertical_flip=False) # randomly flip images
datagen.fit(X_train)
model.fit_generator(datagen.flow(X_train, Y_train, batch_size=batch_size),
steps_per_epoch=X_train.shape[0] // batch_size,
epochs=epochs,
validation_data=(X_test, Y_test))
--Currently, it is necessary to set Backend to TensorFlow. --There is also input_shape as an argument, but as far as the code is seen, it will not work unless it is 224, 192, 160, 128, so it seems better to use input_tensor quietly.
----------------------------------------------------------------------------
Width Multiplier (alpha) | ImageNet Acc | Multiply-Adds (M) | Params (M)
----------------------------------------------------------------------------
| 1.0 MobileNet-224 | 70.6 % | 529 | 4.2 |
| 0.75 MobileNet-224 | 68.4 % | 325 | 2.6 |
| 0.50 MobileNet-224 | 63.7 % | 149 | 1.3 |
| 0.25 MobileNet-224 | 50.6 % | 41 | 0.5 |
----------------------------------------------------------------------------
-By making $ \ alpha $ smaller, it is possible to significantly reduce the parameters.
------------------------------------------------------------------------
Resolution | ImageNet Acc | Multiply-Adds (M) | Params (M)
------------------------------------------------------------------------
| 1.0 MobileNet-224 | 70.6 % | 529 | 4.2 |
| 1.0 MobileNet-192 | 69.1 % | 529 | 4.2 |
| 1.0 MobileNet-160 | 67.2 % | 529 | 4.2 |
| 1.0 MobileNet-128 | 64.4 % | 529 | 4.2 |
------------------------------------------------------------------------
--The number of parameters does not change even if the image size changes.
--MobileNet is a network that can make a trade-off between performance and computational complexity.
References
Recommended Posts