Overview

I think that mobilenet V3 is famous among several lightweight models of deep learning in the past year (V2 is also implemented in keras).

H-swish (hard-swish) plays a part in weight reduction.

Since normal swish is heavy using sigmoid, the processing is light and the error when approximating + quantization is small.

It seems that. Please refer to the following article for a detailed explanation in Japanese. Reference: [[Reading thesis] Searching for Mobilenet V3](https://woodyzootopia.github.io/2019/09/%E8%AB%96%E6%96%87%E8%AA%AD%E3%81%BFsearching -for-mobilenetv3)

This article is about implementing hard-swish in keras. However, the function itself is not that difficult, so it can be implemented quickly on the back end. Since it is a big deal, I would like to implement swish and compare it with a graph. Also, using the backend as an implementation method

Definition as an activation function
Definition as a layer There are two implementations.

Premise

tensorflow 1.15.0 keras 2.3.1

Confirmation of definition

Check the definition for the time being.

h-swish [x] = x\frac{ReLU6(x+3)}{6}

swish[x] = x×Sigmoid(x)

[Pattern 1] Defined as an activation function

1. Function definition

I implemented it by referring to the official document of keras. How to use activation function

`h-swish.py`


from keras import backend as K

#hard_definition of swish
def hard_swish(x):
    return x * (K.relu(x + 3., max_value = 6.) / 6.)

#definition of swish
def swish(x):
    return x * K.sigmoid(x)

Since the backend relu has an argument max_value that can set an upper limit, after defining ReLU6 with that, just implement it according to the formula.

2. Confirmation of results

Check if the defined function is as defined. Let's also calculate this with a numpy array using the backend.

`backend_result.py`


from keras import backend as K

import numpy as np
import matplotlib.pyplot as plt

#-0 from 10 to 10.Define array in 2 increments
inputs = np.arange(-10, 10.2, 0.2)
#Change numpy array to tensor
inputs_v = K.variable(inputs)
#Define an arithmetic graph with the defined function
outputs_hs = hard_swish(inputs_v)
outputs_s = swish(inputs_v)
#Calculate and get output
outputs_hs = K.get_value(outputs_hs)
outputs_s = K.get_value(outputs_s)
#View results
plt.figure(figsize=(14,7))
plt.yticks(range(0, 9, 1))
plt.xticks(range(-8, 9, 1))
plt.grid(True)
plt.plot(inputs, outputs_hs, label="hard_swish")
plt.plot(inputs, outputs_s, label="swish")
plt.legend(bbox_to_anchor=(1, 1), loc='lower right', borderaxespad=0, fontsize=18)

** Results of implementation of this article ** swish hard-swish.png ** Paper results ** swish hard-swish on doc.png Paper URL: Searching for MobileNetV3

Sounds good.

3. How to use

Just apply the function you defined earlier to activation.

`conv.py`


from keras.layers import Conv2D
Conv2D(16,(3,3),padding = "SAME", activation = hard_swish)

`conv.py`


from keras.layers import Activation
Activation(hard_swish)

[Pattern 2] Definition as a layer

I referred to the implementation of keras ʻAdvanced Activations` on github. Paper URL: advanced_activations.py

`h-swish_layer.py`


from keras import backend as K
from keras.engine.topology import Layer

#hard_definition of swish
class Hard_swish(Layer):
    def __init__(self):
        super(Hard_swish, self).__init__()

    def call(self, inputs):
        return inputs * (K.relu(inputs + 3., max_value=6.) / 6.)

    def compute_output_shape(self, input_shape):
        return input_shape

This is an example of how to use it. I am assuming cifar10.

`h-swish_use.py`


inputs = Input(shape = (32,32,3))
x = Conv2D(64,(3,3),padding = "SAME")(inputs)
x = Hard_swish()(x)
x = Conv2D(64,(3,3),padding = "SAME")(x)
x = Hard_swish()(x)
x = MaxPooling2D()(x)

x = Conv2D(128,(3,3),padding = "SAME")(x)
x = Hard_swish()(x)
x = Conv2D(128,(3,3),padding = "SAME")(x)
x = Hard_swish()(x)
x = MaxPooling2D()(x)

x = Conv2D(256,(3,3),padding = "SAME")(x)
x = Hard_swish()(x)
x = Conv2D(256,(3,3),padding = "SAME")(x)
x = Hard_swish()(x)
x = GlobalAveragePooling2D()(x)

x = Dense(1024)(x)
x = Hard_swish()(x)
prediction = Dense(10,activation = "softmax")(x)

model = Model(inputs, prediction )
model.summary()

`model_output`


Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 32, 32, 3)]       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 32, 32, 64)        1792      
_________________________________________________________________
hard_swish (Hard_swish)      (None, 32, 32, 64)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 32, 32, 64)        36928     
_________________________________________________________________
hard_swish_1 (Hard_swish)    (None, 32, 32, 64)        0         
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 16, 16, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 16, 16, 128)       73856     
_________________________________________________________________
hard_swish_2 (Hard_swish)    (None, 16, 16, 128)       0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 16, 16, 128)       147584    
_________________________________________________________________
hard_swish_3 (Hard_swish)    (None, 16, 16, 128)       0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 8, 8, 128)         0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 8, 8, 256)         295168    
_________________________________________________________________
hard_swish_4 (Hard_swish)    (None, 8, 8, 256)         0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 8, 8, 256)         590080    
_________________________________________________________________
hard_swish_5 (Hard_swish)    (None, 8, 8, 256)         0         
_________________________________________________________________
global_average_pooling2d (Gl (None, 256)               0         
_________________________________________________________________
dense (Dense)                (None, 1024)              263168    
_________________________________________________________________
hard_swish_6 (Hard_swish)    (None, 1024)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                10250     
=================================================================
Total params: 1,418,826
Trainable params: 1,418,826
Non-trainable params: 0
_________________________________________________________________

The merit (?) Compared with the definition of the activation function is that you can see that hard-swish is used when visualized with summary (about).

in conclusion

I couldn't find the implementation of hard-swish keras even if I googled this time, so I tried to implement it. It was a good opportunity to find out that the ReLU function that I often used has an argument of max_value. If you have any questions or concerns, please leave a comment.

Implemented hard-swish in Keras

Overview

Premise

Confirmation of definition

[Pattern 1] Defined as an activation function

1. Function definition

h-swish.py

2. Confirmation of results

backend_result.py

3. How to use

conv.py

conv.py

[Pattern 2] Definition as a layer

h-swish_layer.py

h-swish_use.py

model_output

in conclusion

`h-swish.py`

`backend_result.py`

`conv.py`

`conv.py`

`h-swish_layer.py`

`h-swish_use.py`

`model_output`