Implemented hard-swish in Keras

Overview

I think that mobilenet V3 is famous among several lightweight models of deep learning in the past year (V2 is also implemented in keras).

H-swish (hard-swish) plays a part in weight reduction.

Since normal swish is heavy using sigmoid, the processing is light and the error when approximating + quantization is small.

It seems that. Please refer to the following article for a detailed explanation in Japanese. Reference: [[Reading thesis] Searching for Mobilenet V3](https://woodyzootopia.github.io/2019/09/%E8%AB%96%E6%96%87%E8%AA%AD%E3%81%BFsearching -for-mobilenetv3)

This article is about implementing hard-swish in keras. However, the function itself is not that difficult, so it can be implemented quickly on the back end. Since it is a big deal, I would like to implement swish and compare it with a graph. Also, using the backend as an implementation method

  1. Definition as an activation function
  2. Definition as a layer There are two implementations.

Premise

tensorflow 1.15.0 keras 2.3.1

Confirmation of definition

Check the definition for the time being.

h-swish [x] = x\frac{ReLU6(x+3)}{6}
swish[x] = x×Sigmoid(x)

[Pattern 1] Defined as an activation function

1. Function definition

I implemented it by referring to the official document of keras. How to use activation function

h-swish.py


from keras import backend as K

#hard_definition of swish
def hard_swish(x):
    return x * (K.relu(x + 3., max_value = 6.) / 6.)

#definition of swish
def swish(x):
    return x * K.sigmoid(x)

Since the backend relu has an argument max_value that can set an upper limit, after defining ReLU6 with that, just implement it according to the formula.

2. Confirmation of results

Check if the defined function is as defined. Let's also calculate this with a numpy array using the backend.

backend_result.py


from keras import backend as K

import numpy as np
import matplotlib.pyplot as plt

#-0 from 10 to 10.Define array in 2 increments
inputs = np.arange(-10, 10.2, 0.2)
#Change numpy array to tensor
inputs_v = K.variable(inputs)
#Define an arithmetic graph with the defined function
outputs_hs = hard_swish(inputs_v)
outputs_s = swish(inputs_v)
#Calculate and get output
outputs_hs = K.get_value(outputs_hs)
outputs_s = K.get_value(outputs_s)
#View results
plt.figure(figsize=(14,7))
plt.yticks(range(0, 9, 1))
plt.xticks(range(-8, 9, 1))
plt.grid(True)
plt.plot(inputs, outputs_hs, label="hard_swish")
plt.plot(inputs, outputs_s, label="swish")
plt.legend(bbox_to_anchor=(1, 1), loc='lower right', borderaxespad=0, fontsize=18)

** Results of implementation of this article ** swish hard-swish.png ** Paper results ** swish hard-swish on doc.png Paper URL: Searching for MobileNetV3

Sounds good.

3. How to use

Just apply the function you defined earlier to activation.

conv.py


from keras.layers import Conv2D
Conv2D(16,(3,3),padding = "SAME", activation = hard_swish)

Or

conv.py


from keras.layers import Activation
Activation(hard_swish)

[Pattern 2] Definition as a layer

I referred to the implementation of keras ʻAdvanced Activations` on github. Paper URL: advanced_activations.py

h-swish_layer.py


from keras import backend as K
from keras.engine.topology import Layer

#hard_definition of swish
class Hard_swish(Layer):
    def __init__(self):
        super(Hard_swish, self).__init__()

    def call(self, inputs):
        return inputs * (K.relu(inputs + 3., max_value=6.) / 6.)

    def compute_output_shape(self, input_shape):
        return input_shape

This is an example of how to use it. I am assuming cifar10.

h-swish_use.py


inputs = Input(shape = (32,32,3))
x = Conv2D(64,(3,3),padding = "SAME")(inputs)
x = Hard_swish()(x)
x = Conv2D(64,(3,3),padding = "SAME")(x)
x = Hard_swish()(x)
x = MaxPooling2D()(x)

x = Conv2D(128,(3,3),padding = "SAME")(x)
x = Hard_swish()(x)
x = Conv2D(128,(3,3),padding = "SAME")(x)
x = Hard_swish()(x)
x = MaxPooling2D()(x)

x = Conv2D(256,(3,3),padding = "SAME")(x)
x = Hard_swish()(x)
x = Conv2D(256,(3,3),padding = "SAME")(x)
x = Hard_swish()(x)
x = GlobalAveragePooling2D()(x)

x = Dense(1024)(x)
x = Hard_swish()(x)
prediction = Dense(10,activation = "softmax")(x)

model = Model(inputs, prediction )
model.summary()

model_output


Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 32, 32, 3)]       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 32, 32, 64)        1792      
_________________________________________________________________
hard_swish (Hard_swish)      (None, 32, 32, 64)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 32, 32, 64)        36928     
_________________________________________________________________
hard_swish_1 (Hard_swish)    (None, 32, 32, 64)        0         
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 16, 16, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 16, 16, 128)       73856     
_________________________________________________________________
hard_swish_2 (Hard_swish)    (None, 16, 16, 128)       0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 16, 16, 128)       147584    
_________________________________________________________________
hard_swish_3 (Hard_swish)    (None, 16, 16, 128)       0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 8, 8, 128)         0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 8, 8, 256)         295168    
_________________________________________________________________
hard_swish_4 (Hard_swish)    (None, 8, 8, 256)         0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 8, 8, 256)         590080    
_________________________________________________________________
hard_swish_5 (Hard_swish)    (None, 8, 8, 256)         0         
_________________________________________________________________
global_average_pooling2d (Gl (None, 256)               0         
_________________________________________________________________
dense (Dense)                (None, 1024)              263168    
_________________________________________________________________
hard_swish_6 (Hard_swish)    (None, 1024)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                10250     
=================================================================
Total params: 1,418,826
Trainable params: 1,418,826
Non-trainable params: 0
_________________________________________________________________

The merit (?) Compared with the definition of the activation function is that you can see that hard-swish is used when visualized with summary (about).

in conclusion

I couldn't find the implementation of hard-swish keras even if I googled this time, so I tried to implement it. It was a good opportunity to find out that the ReLU function that I often used has an argument of max_value. If you have any questions or concerns, please leave a comment.

Recommended Posts

Implemented hard-swish in Keras
Implemented Shiritori in Python
Implement LSTM AutoEncoder in Keras
Implemented word2vec with Theano + Keras
Sudoku solver implemented in Python 3
Implemented Efficient GAN with keras
6 Ball puzzle implemented in python
Implemented image segmentation in python (Union-Find)
Widrow-Hoff learning rules implemented in Python
Solution for ValueError in Keras imdb.load_data
Implemented label propagation method in Python
Use API not implemented in twython
Implemented iOS push notifications in Firebase
Implemented bubble sort in Java (BubbleSort)
Implemented Perceptron learning rules in Python
Simple regression analysis implementation in Keras
Implemented in 1 minute! LINE Notify in Python
I implemented the VGG16 model in Keras and tried to identify CIFAR10
CIFAR-10 classification implemented in virtually 60 lines in PyTorch
A simple HTTP client implemented in Python
Implemented in Python PRML Chapter 7 Nonlinear SVM
I implemented Cousera's logistic regression in Python
Implemented in Python PRML Chapter 5 Neural Networks
Implemented Stooge sort in Python3 (Bubble sort & Quicksort)
Hack GraphConvModel implemented in DeepChem with summary
Implemented in Python PRML Chapter 1 Bayesian Inference
Implemented DQN in TensorFlow (I wanted to ...)
Implemented "slanted triangular learning rate" in Keras, which is effective in BERT fine tuning