Convolution strides and Average Pooling

Introduction

AveragePooling2D and MaxPooling2D are often used to halve the image size in deep learning convolution operations. By halving the image size, the filter kernel_size will be substantially larger in subsequent convolution layers. On the other hand, strides in Conv2D is also occasionally used to halve the image size. Let us consider how this difference can affect the filter breadth of the convolution operation.

※Caution If each layer of the convolution contains the activation function relu, it is impossible to convert it to a simple convolution as discussed below. Therefore, the following discussion is limited to the condition that it holds only when the activation function does not exist.

1. Convolution calculation of scipy.signal

First, make sure that the 2D convolution can be calculated using scipy.signal.correlate2d () and scipy.signal.convolve2d (). If A1 and B1 are defined appropriately, the result of convolving A1 into the familiar B1 could be obtained by correlate2d (). Also note that when using convolve2d (), the matrix order must be reversed.

python


import numpy as np
import scipy.signal as sp

A1 = np.random.randint(0,2,(3,3))
B1 = np.random.randint(-4,5,(7,7))
C1 = sp.correlate2d(B1, A1, mode='valid')
C2 = sp.convolve2d(B1, A1[::-1,::-1], mode='valid')

print('A1=\n', A1)
print('B1=\n', B1)
print('C1=\n', C1)
print('C2=\n', C2)
-----------------------------------------------
A1=
 [[0 1 0]
 [0 0 0]
 [1 0 1]]
B1=
 [[-1 -3  4 -3  1 -2 -2]
 [-3 -4 -2 -2  1  3  2]
 [-1 -4  2  3  3  3 -3]
 [ 3 -3 -2  1  3 -1  0]
 [ 2 -4  0  2  4  0  2]
 [ 4  0 -3  4  4  3  0]
 [ 0 -4 -2  4  3 -3  1]]
C1=
 [[-2  3  2  7 -2]
 [-3 -4 -1  1  6]
 [-2  0  7  5  9]
 [-2  2  2 10  3]
 [-6  0  3  5  4]]
C2=
 [[-2  3  2  7 -2]
 [-3 -4 -1  1  6]
 [-2  0  7  5  9]
 [-2  2  2 10  3]
 [-6  0  3  5  4]]

2. When 3x3 convolution is applied twice

In the Keras description, if 3x3 convolution is multiplied twice as shown below, what kind of convolution operation is equivalent? This is equivalent to one 5x5 convolution, as many know. (* However, under the condition that there is no activation function relu during convolution)

python



x = Input(shape=(28, 28, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (3, 3), padding="valid")(x)

It can be shown that the result C1 that convolves A1 => A2 in B1 and the result E1 that convolves with D1 as a result of finding the convolution of A1 and A2 are the same as shown below. So two 3x3 convolutions are equal to one 5x5 convolution, The 5x5 convolution filter can be calculated with D1 = sp.convolve2d (A1, A2, mode ='full').

python


import numpy as np
import scipy.signal as sp

A1 = np.random.randint(0,2,(3,3))
A2 = np.random.randint(0,2,(3,3))
B1 = np.random.randint(-4,5,(7,7))
C1 = sp.correlate2d(B1, A1, mode='valid')
C2 = sp.correlate2d(C1, A2, mode='valid')
D1 = sp.convolve2d(A1, A2, mode='full')
E1 = sp.correlate2d(B1, D1, mode='valid')

print('A1=\n', A1)
print('A2=\n', A2)
print('B1=\n', B1)
print('C1=\n', C1)
print('C2=\n', C2)
print('D1=\n', D1)
print('E1=\n', E1)
----------------------------------------------
A1=
 [[1 0 1]
 [1 1 1]
 [0 0 0]]
A2=
 [[0 0 1]
 [0 1 0]
 [0 0 0]]
B1=
 [[-3 -4  4 -4  0 -2  3]
 [ 2  0 -3  3  4  2 -4]
 [ 0  4 -4  0  1  1  2]
 [-4 -3  0 -1 -3  0 -2]
 [ 4  1  2 -2 -3 -1  1]
 [ 3 -1 -1 -4 -1  3 -2]
 [ 4  0 -1 -3 -4 -4 -3]]
C1=
 [[  0  -8   8   3   5]
 [ -1   3  -2   7   4]
 [-11   0  -7  -3  -2]
 [  3  -3  -6  -7  -8]
 [  7  -7  -7  -5  -2]]
C2=
 [[ 11   1  12]
 [ -2   0   1]
 [-10  -9  -9]]
D1=
 [[0 0 1 0 1]
 [0 1 1 2 1]
 [0 1 1 1 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
E1=
 [[ 11   1  12]
 [ -2   0   1]
 [-10  -9  -9]]

3. When 5x5 convolution is applied once

Consider the opposite of 2. Is it possible to break this down into two 3x3 convolutions when any 5x5 convolution is present once? Theoretically, if the arbitrary 5x5 convolution 1 and 3x3 convolution 1 convolution filters are clear, then the weight of the other 3x3 convolution can be calculated as a deconvolution. If D1 = convolve2d (A1, A2), then ʻA2 = deconvolve2d (D1, A1)`.

python


x = Input(shape=(28, 28, 1))
x = Conv2D(1, (5, 5), padding="valid")(x)

However, the ** two-dimensional deconvolution function ** does not exist in scipy, so I could not confirm it in scipy. Using the deconvolution function for images, we obtained an approximation of deconvolution in F1 and F2 as shown below. In the first place, the number of parameters for any 5x5 convolution is 25, while the number of parameters for 2 3x3 convolutions is 18, so it is impossible to show all 5x5 convolutions with 2 3x3 convolutions in terms of the number of parameters.

python


import numpy as np
import scipy.signal as sp
from skimage import restoration

A1 = np.random.randint(0,2,(3,3))
A2 = np.random.randint(0,2,(3,3))
D1 = sp.convolve2d(A1, A2, mode='full')
F1, _ = restoration.unsupervised_wiener(D1, A2)
F2, _ = restoration.unsupervised_wiener(D1, A1)

print('A1=\n', A1)
print('A2=\n', A2)
print('D1=\n', D1)
print('F1=\n', F1[1:-1,1:-1])
print('F2=\n', F2[1:-1,1:-1])
---------------------------------
A1=
 [[1 1 0]
 [1 0 0]
 [0 1 1]]
A2=
 [[0 0 0]
 [1 1 1]
 [0 0 0]]
D1=
 [[0 0 0 0 0]
 [1 2 2 1 0]
 [1 1 1 0 0]
 [0 1 2 2 1]
 [0 0 0 0 0]]
F1=
 [[ 0.96529897  0.89627574  0.13951081]
 [ 0.7975316   0.15137364 -0.02074955]
 [ 0.0798071   0.92025068  0.92801951]]
F2=
 [[ 0.00971747  0.0117676   0.00229734]
 [ 0.96429178  0.99937217  0.97182209]
 [-0.00158553  0.03361686 -0.00815724]]

Summary ・ Any 2x3 convolutions can be converted to 1 5x5 convolution => True ・ One arbitrary 5x5 convolution can be converted to two 3x3 convolutions => False (approximate value is possible)

4. In the case of Average Pooling ((2,2))

By the way, Average Pooling with pool size (2,2) is Of Conv2D where x = Conv2D (1, (2, 2), padding =" valid ", strides = 2) (x) and the convolution filter weights are ((0.25,0.25), (0.25,0.25)) I want to show that it is equal to an operation. If the smoothing filter A1 is defined as follows, the result D2 obtained by convolving A1 into B1 and performing the strides processing is equal to the result C1 of Average Pooling.

A1: Smoothing filter image.png

python


x = Input(shape=(28, 28, 1))
x = AveragePooling2D((2, 2))(x)

python


import numpy as np
import scipy.signal as sp
import skimage.measure

A1 = np.ones((2,2))/4
B1 = np.random.randint(-4,5,(8,8))*4
C1 = skimage.measure.block_reduce(B1, (2,2), np.average)
D1 = sp.correlate2d(B1, A1, mode='valid')
D2 = D1[::2,::2]

print('A1=\n', A1)
print('B1=\n', B1)
print('C1=\n', C1)
print('D1=\n', D1)
print('D2=\n', D2)
---------------------------------
A1=
 [[0.25 0.25]
 [0.25 0.25]]
B1=
 [[-12  -4   0   8  -4   4 -12  -8]
 [  4  -4   8 -12  -8 -16 -12  -8]
 [-16 -16 -16   8   8 -16 -12   0]
 [-12  -8  -4  16  -4  -8 -16  -4]
 [-12  -8 -12  16  -4   4  -4   0]
 [-12 -12   4  -8 -12   8   0  16]
 [ -8  16   8  16 -12 -12 -16   0]
 [ -8   4   4 -12  12  -4 -16   4]]
C1=
 [[ -4.   1.  -6. -10.]
 [-13.   1.  -5.  -8.]
 [-11.   0.  -1.   3.]
 [  1.   4.  -4.  -7.]]
D1=
 [[ -4.   0.   1.  -4.  -6.  -9. -10.]
 [ -8.  -7.  -3.  -1.  -8. -14.  -8.]
 [-13. -11.   1.   7.  -5. -13.  -8.]
 [-10.  -8.   4.   6.  -3.  -6.  -6.]
 [-11.  -7.   0.  -2.  -1.   2.   3.]
 [ -4.   4.   5.  -4.  -7.  -5.   0.]
 [  1.   8.   4.   1.  -4. -12.  -7.]]
D2=
 [[ -4.   1.  -6. -10.]
 [-13.   1.  -5.  -8.]
 [-11.   0.  -1.   3.]
 [  1.   4.  -4.  -7.]]

Therefore Average Pooling is equal to: However, the value of the filter is a smoothing filter.

python


x = Input(shape=(28, 28, 1))
x = AveragePooling2D((2, 2))(x)
---------------------------------
x = Input(shape=(28, 28, 1))
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x)

5. For Conv2D + AveragePooling ((2,2))

If you have Conv2D and Average Pooling as below, I would like to change this to a simple convolution.

python


x = Input(shape=(28, 28, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = AveragePooling2D((2,2))(x)
x = Conv2D(1, (3, 3), padding="valid")(x)

At this time, if the 3x3 convolution A2 filter processing after Average Pooling is considered without Average Pooling, it can be seen that it is equal to the result of folding the 6x6 convolution filter A3, which is twice the size filled with the same value, and then multiplying it by strides. .. A2: image.png A3: image.png Below, it can be confirmed that C3 and E2 are equal.

python


import numpy as np
import scipy.signal as sp
import skimage.measure

A1 = np.random.randint(0,2,(3,3))
A2 = np.random.randint(0,2,(3,3))
A3 = np.repeat(A2, 2, axis=0)
A3 = np.repeat(A3, 2, axis=1)
B1 = np.random.randint(-4,5,(12,12))
C1 = sp.correlate2d(B1, A1, mode='valid')
C2 = skimage.measure.block_reduce(C1, (2,2), np.average)
C3 = sp.correlate2d(C2, A2, mode='valid')
D1 = sp.convolve2d(A1, A3, mode='full')
E1 = sp.correlate2d(B1, D1, mode='valid')/4
E2 = E1[::2,::2]

print('A1=\n', A1)
print('A2=\n', A2)
print('A3=\n', A3)
print('B1=\n', B1)
print('C1=\n', C1)
print('C2=\n', C2)
print('C3=\n', C3)
print('D1=\n', D1)
print('E1=\n', E1)
print('E2=\n', E2)
------------------------------
A1=
 [[0 0 1]
 [1 0 0]
 [0 0 1]]
A2=
 [[1 1 1]
 [0 0 0]
 [0 0 1]]
A3=
 [[1 1 1 1 1 1]
 [1 1 1 1 1 1]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 1 1]
 [0 0 0 0 1 1]]
B1=
 [[ 0 -2 -1 -3 -4  2 -3 -1 -3 -4 -2  1]
 [ 4 -1  2 -4 -4  4 -1 -1 -4 -2  2  2]
 [ 1 -4  3  3 -2  2  3 -2 -3  4 -1  2]
 [-3 -1 -4  4  0  3  1  0  0  0 -4  1]
 [ 3  1  4  1 -2 -2  2  1 -3  0  3 -3]
 [-4  0  3 -4  3  4  2  0  3  0  1 -4]
 [ 1 -3 -1 -1  3 -4 -2  3 -4  1  0 -3]
 [ 1 -4  0  4  4  3 -2 -4  3 -2  3  0]
 [ 0  2 -2  3  3  3  2 -2 -1  0  1 -1]
 [-4 -4  2 -2 -4 -1  4  4 -1  3 -1  4]
 [ 4  4 -2 -1 -1  0  4 -2  3  3 -4  4]
 [-4 -3  0 -1 -2  0 -2  3  3 -1 -2 -4]]
C1=
 [[  6  -1  -4   0  -4   1  -7  -1  -7   1]
 [ -1  -4  -1  10  -2   1  -1  -4  -5   7]
 [  4   3  -8   4   5   2  -5   4   2  -1]
 [  2   1   7   8   1  -2   5   1  -6  -3]
 [ -1   0   4 -10   3   8  -5   1   6  -6]
 [  4  -3   6   6   3  -8   4   1   0  -3]
 [ -2  -2   6   3   4   4  -7  -3   4  -6]
 [  2   4  -2   5   5   3   4  -1   1   4]
 [ -8  -2   4   1   2  -5   6   7  -4   6]
 [  6   1  -8  -2   1   7   6   0   0   3]]
C2=
 [[ 0.    1.25 -1.   -3.25 -1.  ]
 [ 2.5   2.75  1.5   1.25 -2.  ]
 [ 0.    1.5   1.5   0.25 -0.75]
 [ 0.5   3.    4.   -1.75  0.75]
 [-0.75 -1.25  1.25  4.75  1.25]]
C3=
 [[ 1.75 -2.75 -6.  ]
 [10.75  3.75  1.5 ]
 [ 4.25  8.    2.25]]
D1=
 [[0 0 1 1 1 1 1 1]
 [1 1 2 2 2 2 1 1]
 [1 1 2 2 2 2 1 1]
 [0 0 1 1 1 1 1 1]
 [0 0 0 0 0 0 1 1]
 [0 0 0 0 1 1 1 1]
 [0 0 0 0 1 1 1 1]
 [0 0 0 0 0 0 1 1]]
E1=
 [[ 1.75 -3.25 -2.75 -2.75 -6.  ]
 [ 4.   -0.75  0.    3.25 -0.5 ]
 [10.75  6.25  3.75  5.    1.5 ]
 [ 6.5   7.    9.25  3.25  2.5 ]
 [ 4.25  5.5   8.    3.    2.25]]
E2=
 [[ 1.75 -2.75 -6.  ]
 [10.75  3.75  1.5 ]
 [ 4.25  8.    2.25]]

That is, the model consisting of Conv2D and Average Pooling below can be organized as follows.

python


x = Input(shape=(28, 28, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = AveragePooling2D((2,2))(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
-----------------------------------------
x = Input(shape=(28, 28, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (6, 6), padding="valid", strides=2)(x)
-----------------------------------------
x = Input(shape=(28, 28, 1))
x = Conv2D(1, (8, 8), padding="valid", strides=2)(x)

6. For Conv2D + AveragePooling ((2,2)) 2

If you have Conv2D and Average Pooling as below, I would like to change this to a simple convolution.

python


x = Input(shape=(28, 28, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = AveragePooling2D((2,2))(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
x = AveragePooling2D((2,2))(x)
x = Conv2D(1, (3, 3), padding="valid")(x)

This can be calculated as follows: You can see that the result of multiplying Conv2D and AveragePooling is C5, which is 6x6 and equal to the result of E3 multiplied by strides2 twice.

python


import numpy as np
import scipy.signal as sp
import skimage.measure

A1 = np.random.randint(0,2,(3,3))
A2 = np.random.randint(0,2,(3,3))
A3 = np.random.randint(0,2,(3,3))
A4 = np.repeat(A2, 2, axis=0)
A4 = np.repeat(A4, 2, axis=1)
A5 = np.repeat(A3, 2, axis=0)
A5 = np.repeat(A5, 2, axis=1)
B1 = np.random.randint(-4,5,(26,26))*4
C1 = sp.correlate2d(B1, A1, mode='valid')
C2 = skimage.measure.block_reduce(C1, (2,2), np.average)
C3 = sp.correlate2d(C2, A2, mode='valid')
C4 = skimage.measure.block_reduce(C3, (2,2), np.average)
C5 = sp.correlate2d(C4, A3, mode='valid')
D1 = sp.convolve2d(A1, A4, mode='full')
E1 = sp.correlate2d(B1, D1, mode='valid')/4
E2 = E1[::2,::2]
E3 = sp.correlate2d(E2, A5, mode='valid')/4
E3 = E3[::2,::2]

print('A1=\n', A1)
print('A2=\n', A2)
print('A3=\n', A3)
print('A4=\n', A4)
print('A5=\n', A5)
print('C5=\n', C5)
print('E3=\n', E3)
--------------------------
A1=
 [[1 1 0]
 [1 0 0]
 [0 0 0]]
A2=
 [[0 0 0]
 [0 0 0]
 [1 1 0]]
A3=
 [[1 0 1]
 [0 0 1]
 [1 0 1]]
A4=
 [[0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [1 1 1 1 0 0]
 [1 1 1 1 0 0]]
A5=
 [[1 1 0 0 1 1]
 [1 1 0 0 1 1]
 [0 0 0 0 1 1]
 [0 0 0 0 1 1]
 [1 1 0 0 1 1]
 [1 1 0 0 1 1]]
C5=
 [[ -6.75 -15.5  -45.5 ]
 [-32.5    1.25   9.75]
 [-15.5    5.75 -28.  ]]
E3=
 [[ -6.75 -15.5  -45.5 ]
 [-32.5    1.25   9.75]
 [-15.5    5.75 -28.  ]]

That is, the model consisting of Conv2D and Average Pooling below can be organized as follows.

python


x = Input(shape=(28, 28, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = AveragePooling2D((2,2))(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
x = AveragePooling2D((2,2))(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
-----------------------------------------
x = Input(shape=(28, 28, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (6, 6), padding="valid", strides=2)(x)
x = Conv2D(1, (6, 6), padding="valid", strides=2)(x)

By the way, the above is the limit to arrange as an equivalent expression. However, the above equation can be arranged a little more if an approximate value is allowed and the convolution is decomposed by deconvolution as shown in 3. If you find a 5x5 convolution filter by deconvolution of a 2x2 smoothing filter

python


x = Input(shape=(28, 28, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (6, 6), padding="valid", strides=2)(x)
x = Conv2D(1, (6, 6), padding="valid", strides=2)(x)
-----------------------------------------
x = Input(shape=(28, 28, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (5, 5), padding="valid")(x)
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x) # AveragePooling2D((2,2))
x = Conv2D(1, (6, 6), padding="valid")(x)
x = Conv2D(1, (1, 1), padding="valid", strides=2)(x)
-----------------------------------------
x = Input(shape=(28, 28, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (5, 5), padding="valid")(x)
x = Conv2D(1, (12, 12), padding="valid", strides=2)(x)
x = Conv2D(1, (1, 1), padding="valid", strides=2)(x)
-----------------------------------------
x = Input(shape=(28, 28, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (5, 5), padding="valid")(x)
x = Conv2D(1, (9, 9), padding="valid")(x)
x = Conv2D(1, (4, 4), padding="valid", strides=4)(x)
-----------------------------------------
x = Input(shape=(28, 28, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (5, 5), padding="valid")(x)
x = Conv2D(1, (9, 9), padding="valid")(x)
x = AveragePooling2D((4,4))(x)

Here, the 5x5 convolution filter is a convolution filter obtained by deconvolution of a 2x2 smoothing filter from a 6x6 convolution filter in which a 3x3 convolution filter is filled with the same value. The 9x9 convolution filter is a convolution filter obtained by deconvolution of a 4x4 smoothing filter from a 12x12 convolution filter in which a 3x3 convolution filter is four times filled with the same value. That is, the convolution is summarized as follows.

python


x = Input(shape=(28, 28, 1))
x = Conv2D(1, (18, 18), padding="valid", strides=4)(x)
-----------------------------------------
x = Input(shape=(28, 28, 1))
x = Conv2D(1, (15, 15), padding="valid")(x)
x = AveragePooling2D((4,4))(x)

7. When Conv2D + strides = 2

Consider a model that contains the following stride. This is close in the sense that 3x3 convolution is performed 3 times and the image size is halved twice as in 6. Also, the number of parameters used in the calculation is the same in terms of the number of weights because there are three 3x3 filters.

python


x = Input(shape=(28, 28, 1))
x = Conv2D(1, (3, 3), padding="valid", strides=2)(x)
x = Conv2D(1, (3, 3), padding="valid", strides=2)(x)
x = Conv2D(1, (3, 3), padding="valid")(x)

If you allow the convolution to be disassembled as in 3.

python


x = Input(shape=(28, 28, 1))
x = Conv2D(1, (2, 2), padding="valid")(x)
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x)
x = Conv2D(1, (2, 2), padding="valid")(x)
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
-----------------------------------------
x = Input(shape=(28, 28, 1))
x = Conv2D(1, (2, 2), padding="valid")(x)
x = Conv2D(1, (4, 4), padding="valid", strides=2)(x)
x = Conv2D(1, (6, 6), padding="valid", strides=2)(x)
-----------------------------------------
x = Input(shape=(28, 28, 1))
x = Conv2D(1, (2, 2), padding="valid")(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x)
x = Conv2D(1, (6, 6), padding="valid", strides=2)(x)
-----------------------------------------
x = Input(shape=(28, 28, 1))
x = Conv2D(1, (2, 2), padding="valid")(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (12, 12), padding="valid", strides=4)(x)
-----------------------------------------
x = Input(shape=(28, 28, 1))
x = Conv2D(1, (2, 2), padding="valid")(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (9, 9), padding="valid")(x)
x = Conv2D(1, (4, 4), padding="valid", strides=4)(x)
-----------------------------------------
x = Input(shape=(28, 28, 1))
x = Conv2D(1, (2, 2), padding="valid")(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (9, 9), padding="valid")(x)
x = AveragePooling2D((4,4))(x)

It can be seen that the size of the convolution filter is slightly smaller than that of 6. If you want to make the size of the convolution filter the same as Average Pooling 2D, I think it is necessary to increase the filter size by 1 when using strides.

python


x = Input(shape=(28, 28, 1))
x = Conv2D(1, (4, 4), padding="valid", strides=2)(x)
x = Conv2D(1, (4, 4), padding="valid", strides=2)(x)
x = Conv2D(1, (3, 3), padding="valid")(x)

8. For multi-stage models

--For AveragePooling2D

python


x = Input(shape=(224, 224, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = AveragePooling2D((2,2))(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
x = AveragePooling2D((2,2))(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
x = AveragePooling2D((2,2))(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
x = AveragePooling2D((2,2))(x)
x = Conv2D(1, (3, 3), padding="valid")(x)

If the deconvolution of the smoothing filter allows the convolution to be disassembled,

python


x = Input(shape=(224, 224, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (6, 6), padding="valid", strides=2)(x)
x = Conv2D(1, (6, 6), padding="valid", strides=2)(x)
x = Conv2D(1, (6, 6), padding="valid", strides=2)(x)
x = Conv2D(1, (6, 6), padding="valid", strides=2)(x)
-----------------------------------------
x = Input(shape=(224, 224, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (5, 5), padding="valid")(x)
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x)
x = Conv2D(1, (5, 5), padding="valid")(x)
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x)
x = Conv2D(1, (5, 5), padding="valid")(x)
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x)
x = Conv2D(1, (6, 6), padding="valid")(x)
x = Conv2D(1, (1, 1), padding="valid", strides=2)(x)
-----------------------------------------
x = Input(shape=(224, 224, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (5, 5), padding="valid")(x)
x = Conv2D(1, (10, 10), padding="valid", strides=2)(x)
x = Conv2D(1, (10, 10), padding="valid", strides=2)(x)
x = Conv2D(1, (12, 12), padding="valid", strides=2)(x)
x = Conv2D(1, (1, 1), padding="valid", strides=2)(x)
-----------------------------------------
x = Input(shape=(224, 224, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (5, 5), padding="valid")(x)
x = Conv2D(1, (9, 9), padding="valid")(x)
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x)
x = Conv2D(1, (9, 9), padding="valid")(x)
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x)
x = Conv2D(1, (12, 12), padding="valid")(x)
x = Conv2D(1, (1, 1), padding="valid", strides=4)(x)
-----------------------------------------
x = Input(shape=(224, 224, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (5, 5), padding="valid")(x)
x = Conv2D(1, (9, 9), padding="valid")(x)
x = Conv2D(1, (18, 18), padding="valid", strides=2)(x)
x = Conv2D(1, (24, 24), padding="valid", strides=2)(x)
x = Conv2D(1, (1, 1), padding="valid", strides=4)(x)
-----------------------------------------
x = Input(shape=(224, 224, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (5, 5), padding="valid")(x)
x = Conv2D(1, (9, 9), padding="valid")(x)
x = Conv2D(1, (17, 17), padding="valid")(x)
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x)
x = Conv2D(1, (24, 24), padding="valid")(x)
x = Conv2D(1, (1, 1), padding="valid", strides=8)(x)
-----------------------------------------
x = Input(shape=(224, 224, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (5, 5), padding="valid")(x)
x = Conv2D(1, (9, 9), padding="valid")(x)
x = Conv2D(1, (17, 17), padding="valid")(x)
x = Conv2D(1, (48, 48), padding="valid", strides=2)(x)
x = Conv2D(1, (1, 1), padding="valid", strides=8)(x)
-----------------------------------------
x = Input(shape=(224, 224, 1))
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (5, 5), padding="valid")(x)
x = Conv2D(1, (9, 9), padding="valid")(x)
x = Conv2D(1, (17, 17), padding="valid")(x)
x = Conv2D(1, (33, 33), padding="valid")(x)
x = Conv2D(1, (16, 16), padding="valid", strides=16)(x)
-----------------------------------------
x = Input(shape=(224, 224, 1))
x = Conv2D(1, (63, 63), padding="valid")(x)
x = AveragePooling2D((16,16))(x)
-----------------------------------------
x = Input(shape=(224, 224, 1))
x = Conv2D(1, (78, 78), padding="valid", strides=16)(x)

It can be seen that the size of the convolution of each layer increases as ** (3 → 5 → 9 → 17 → 33) ** each time it passes through Average Pooling 2D. In short, the size of the convolution of each layer doubles.

--When Conv2D + strides = 2

python


x = Input(shape=(224, 224, 1))
x = Conv2D(1, (3, 3), padding="valid", strides=2)(x)
x = Conv2D(1, (3, 3), padding="valid", strides=2)(x)
x = Conv2D(1, (3, 3), padding="valid", strides=2)(x)
x = Conv2D(1, (3, 3), padding="valid", strides=2)(x)
x = Conv2D(1, (3, 3), padding="valid")(x)

If the deconvolution of the smoothing filter allows the convolution to be disassembled,

python


x = Input(shape=(224, 224, 1))
x = Conv2D(1, (2, 2), padding="valid")(x)
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x)
x = Conv2D(1, (2, 2), padding="valid")(x)
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x)
x = Conv2D(1, (2, 2), padding="valid")(x)
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x)
x = Conv2D(1, (2, 2), padding="valid")(x)
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
-----------------------------------------
x = Input(shape=(224, 224, 1))
x = Conv2D(1, (2, 2), padding="valid")(x)
x = Conv2D(1, (4, 4), padding="valid", strides=2)(x)
x = Conv2D(1, (4, 4), padding="valid", strides=2)(x)
x = Conv2D(1, (4, 4), padding="valid", strides=2)(x)
x = Conv2D(1, (6, 6), padding="valid", strides=2)(x)
-----------------------------------------
x = Input(shape=(224, 224, 1))
x = Conv2D(1, (2, 2), padding="valid")(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x)
x = Conv2D(1, (6, 6), padding="valid")(x)
x = Conv2D(1, (1, 1), padding="valid", strides=2)(x)
-----------------------------------------
x = Input(shape=(224, 224, 1))
x = Conv2D(1, (2, 2), padding="valid")(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (6, 6), padding="valid", strides=2)(x)
x = Conv2D(1, (6, 6), padding="valid", strides=2)(x)
x = Conv2D(1, (12, 12), padding="valid", strides=2)(x)
x = Conv2D(1, (1, 1), padding="valid", strides=2)(x)
-----------------------------------------
x = Input(shape=(224, 224, 1))
x = Conv2D(1, (2, 2), padding="valid")(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (5, 5), padding="valid")(x)
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x)
x = Conv2D(1, (5, 5), padding="valid")(x)
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x)
x = Conv2D(1, (12, 12), padding="valid")(x)
x = Conv2D(1, (1, 1), padding="valid", strides=4)(x)
-----------------------------------------
x = Input(shape=(224, 224, 1))
x = Conv2D(1, (2, 2), padding="valid")(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (5, 5), padding="valid")(x)
x = Conv2D(1, (10, 10), padding="valid", strides=2)(x)
x = Conv2D(1, (24, 24), padding="valid", strides=2)(x)
x = Conv2D(1, (1, 1), padding="valid", strides=4)(x)
-----------------------------------------
x = Input(shape=(224, 224, 1))
x = Conv2D(1, (2, 2), padding="valid")(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (5, 5), padding="valid")(x)
x = Conv2D(1, (9, 9), padding="valid")(x)
x = Conv2D(1, (2, 2), padding="valid", strides=2)(x)
x = Conv2D(1, (24, 24), padding="valid")(x)
x = Conv2D(1, (1, 1), padding="valid", strides=8)(x)
-----------------------------------------
x = Input(shape=(224, 224, 1))
x = Conv2D(1, (2, 2), padding="valid")(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (5, 5), padding="valid")(x)
x = Conv2D(1, (9, 9), padding="valid")(x)
x = Conv2D(1, (48, 48), padding="valid", strides=2)(x)
x = Conv2D(1, (1, 1), padding="valid", strides=8)(x)
-----------------------------------------
x = Input(shape=(224, 224, 1))
x = Conv2D(1, (2, 2), padding="valid")(x)
x = Conv2D(1, (3, 3), padding="valid")(x)
x = Conv2D(1, (5, 5), padding="valid")(x)
x = Conv2D(1, (9, 9), padding="valid")(x)
x = Conv2D(1, (33, 33), padding="valid")(x)
x = Conv2D(1, (16, 16), padding="valid", strides=16)(x)
-----------------------------------------

It can be seen that the size of the convolution of each layer increases as ** (2 → 3 → 5 → 9 → 33) ** each time stripes = 2. Compared to AveragePooling, only the convolution size of the final layer seems unbalanced.

python


x = Input(shape=(224, 224, 1))
x = Conv2D(1, (3, 3), padding="valid", strides=2)(x)
x = Conv2D(1, (3, 3), padding="valid", strides=2)(x)
x = Conv2D(1, (3, 3), padding="valid", strides=2)(x)
x = Conv2D(1, (3, 3), padding="valid", strides=2)(x)
x = Conv2D(1, (2, 2), padding="valid")(x)

If the size of the final convolution is 2x2, the size of the convolution will be ** (2 → 3 → 5 → 9 → 17) **. However, 2x2 convolution does not contribute to speeding up, so there is no point in using it.

python


x = Input(shape=(224, 224, 1))
x = Conv2D(1, (4, 4), padding="valid", strides=2)(x)
x = Conv2D(1, (4, 4), padding="valid", strides=2)(x)
x = Conv2D(1, (4, 4), padding="valid", strides=2)(x)
x = Conv2D(1, (4, 4), padding="valid", strides=2)(x)
x = Conv2D(1, (3, 3), padding="valid")(x)

Also, if you use 4x4 convolution as described above, you can reproduce ** (3 → 5 → 9 → 17 → 33) ** similar to Average Pooling. This suggests that the size of the filter is reduced by 1 at Conv2D + strides = 2 because the deconvolution is calculated for the 2x2 smoothing filter of Average Pooling. Considering a 4x4 filter with Conv2D + strides = 2, there is a disadvantage that the number of parameters is larger than Average Pooling. (Conversely, it may be a merit) Considering whether it is possible to generate a convolution filter that extracts features of arbitrary size using convolution filters of various sizes reminds us of the problem of measuring continuous weights with a finite number of weights.

Summary:

We considered the size of the convolution filter when using Conv2D strides and Average Pooling. Both are processes that halve the image size, but it was suggested that the size of the convolution filter is smaller than Average Pooling when using Conv2D strides. However, such arrangement is possible only when there is no activation function relu and the convolution can be decomposed into two by deconvolution, so it is necessary to note that such arrangement is not possible in a general model. is.

Recommended Posts

Convolution strides and Average Pooling
Introduction to Deep Learning ~ Convolution and Pooling ~
Grad-CAM and dilated convolution
Candlestick chart and moving average plot