1.First of all

I'm reading a masterpiece, ** "Deep Learning from Zero" **. This time is a memo of Chapter 7. To execute the code, download the entire code from Github and use jupyter notebook in ch07.

2. Sample implementation

I will implement a sample to actually move ** Convolution, Pooling ** that I studied in the textbook. The data used is MNIST, and the weight of the Convolution filter is learned (params.pkl saved in the ch07 folder). The code for Convolution and Pooling is imported from common / layers.py and used.

First, let's run it.

import sys, os
sys.path.append(os.pardir)  #Settings for importing files in the parent directory
import numpy as np
import matplotlib.pyplot as plt
from simple_convnet import SimpleConvNet
from common.layers import *  #Layer import
from dataset.mnist import load_mnist

#Display function(FH, FW) 
def show(filters):    
    FH, FW = filters.shape  
    fig = plt.figure(figsize=(FH*0.1, FW*0.1))  #Display size specification
    plt.imshow(((filters)), cmap='gray')
    plt.tick_params(left=False, labelleft=False, bottom=False, labelbottom=False)  #Erase the axis scale / label
    plt.show()

#Read MNIST data
(x_train, t_train), (x_test, t_test) = load_mnist(flatten=False)
x_train = x_train[5:6]  #Select the 5th data from the beginning

#Loading learned parameters
network = SimpleConvNet()  #Instantiate SimpleConvNet
network.load_params("params.pkl")  #Read the entire parameter
W1, b1 = network.params['W1'][:1], network.params['b1'][:1]  #Only the first one

#Layer generation
conv = Convolution(W1, b1, stride=1, pad=0)  
pool = Pooling(pool_h=2, pool_w=2, stride=2, pad=0)

#Forward propagation
out1 = conv.forward(x_train)  #Convolution
out2 = pool.forward(out1)  #Pooling

#display
print('input.shape = ',x_train.shape)
show(x_train.reshape(28, 28))
print('filter.shape = ', W1.shape)
show(W1.reshape(5, 5))
print('convolution.shape = ', out1.shape)
show(out1.reshape(24, 24))
print('pooling.shape = ', out2.shape)
show(out2.reshape(12, 12))

スクリーンショット 2020-05-13 09.31.29.png

The MNIST image (1, 1, 28, 28) is convolved with filter5 * 5, padding = 0, stride = 1, and the data of (1, 1, 24, 24) is further filtered2 * 2, padding = 0, Pooling is done with stride = 2 (1, 1, 12, 12) and the data is obtained.

Now let's look at the key points of the code.

3.Convolution

# ------------- from common_layers.py -------------
    def forward(self, x):
        FN, C, FH, FW = self.W.shape
        N, C, H, W = x.shape
        out_h = 1 + int((H + 2*self.pad - FH) / self.stride)
        out_w = 1 + int((W + 2*self.pad - FW) / self.stride)

        #① Convert image data to matrix data with im2col
        col = im2col(x, FH, FW, self.stride, self.pad)

        #② Reshape the filter and expand it into a two-dimensional array
        col_W = self.W.reshape(FN, -1).T

        #③ Calculate the output by matrix operation
        out = np.dot(col, col_W) + self.b

        #④ Adjust the shape of the output
        out = out.reshape(N, out_h, out_w, -1).transpose(0, 3, 1, 2)

        self.x = x
        self.col = col
        self.col_W = col_W
        return out

A diagram showing how a 4D image (batch size, number of channels, image height, image width) is processed by a convolution operation looks like this. スクリーンショット 2020-05-10 15.44.23.png Let's take a look at the most important ** ① im2col ** function.

4.im2col

# ------------- from common_layers.py -------------
def im2col(input_data, filter_h, filter_w, stride=1, pad=0):
    N, C, H, W = input_data.shape
    out_h = (H + 2*pad - filter_h)//stride + 1
    out_w = (W + 2*pad - filter_w)//stride + 1
    img = np.pad(input_data, [(0,0), (0,0), (pad, pad), (pad, pad)], 'constant')
    col = np.zeros((N, C, filter_h, filter_w, out_h, out_w))

    # 24*24 filters 5*Slicing 5 times(stride=When 1)
    for y in range(filter_h):  #5 loops
        y_max = y + stride*out_h  # y_max = y + 24
        for x in range(filter_w):  #5 loops
            x_max = x + stride*out_w  # x_max = x + 24

            #y to y+Up to 24,x to x+Up to 24、スライシング
            col[:, :, y, x, :, :] = img[:, :, y:y_max:stride, x:x_max:stride]

    col = col.transpose(0, 4, 5, 1, 2, 3).reshape(N*out_h*out_w, -1)
    return col

y: y_max: stride, x: x_max: stride means that the range from y to y_max is specified for each stride, and the range from x to x_max is specified for each stride.

With y_max = y + 24, x_max = x + 24, stride = 1, each for loop is a double loop 5 times, so you end up slicing 5 * 5 times with a ** 24 * 24 filter **.

On the other hand, considering what I learned in the textbook, it should be 24 * 24 times slicing ** with the ** 5 * 5 filter. If you turn it into a code, it will look like this.

    # 5*24 with 5 filters*Slicing 24 times(stride=When 1)
    for y in range(0, out_h, stride):    
        for x in range(0, out_w, stride):  
            col[:, :, y, x, :, :] = img[:, :, y:y+filter_h, x:x+filter_w]

Indeed, the total number of elements processed by the ** 24 * 24 filter 5 * 5 times slicing ** and the ** 5 * 5 filter 24 * 24 times slicing ** is the same. If the results are the same, which is better? Of course, it is ** the former **. The reason is that the number of for loops that take a long time to process is overwhelmingly small.

If you show the two methods in the figure, it looks like this スクリーンショット 2020-05-11 21.47.25.png

5. Comparison of two im2cols

Let's check if the results of both are really the same. Execute the following code to visualize the original ʻim2coland the functionmy_im2col` that slices ** 24 * 24 times with a ** 5 * 5 filter, including the data being calculated.

import sys, os
sys.path.append(os.pardir)  #Settings for importing files in the parent directory
import numpy as np
from dataset.mnist import load_mnist
import matplotlib.pyplot as plt

#Data display function( x =Display width, y=Display height, nx =Number of columns)
def show(filters, x, y, nx, margin=1, scale=10):
    FN, C, FH, FW = filters.shape
    ny = int(np.ceil(FN / nx))    
    fig = plt.figure(figsize=(x, y))
    fig.subplots_adjust(left=0, right=1.3, bottom=0, top=1.3, hspace=0.05, wspace=0.05)
    
    for i in range(FN):
        ax = fig.add_subplot(ny, nx, i+1, xticks=[], yticks=[])
        ax.imshow(filters[i, 0], cmap='gray', interpolation='nearest') 
    plt.show()

def my_im2col(input_data, filter_h, filter_w, stride=1, pad=0):
    N, C, H, W = input_data.shape
    out_h = (H + 2*pad - filter_h)//stride + 1  #Output height
    out_w = (W + 2*pad - filter_w)//stride + 1  #Output width
    img = np.pad(input_data, [(0,0), (0,0), (pad, pad), (pad, pad)], 'constant')  #Image padding
    col = np.zeros((N, C, out_h, out_w, filter_h, filter_w))  #col Matrix preparation for calculation
    
    # 5*24 with 5 filters*Slicing 24 times(stride=When 1)
    for y in range(0, out_h, stride):    
        for x in range(0, out_w, stride):  
            col[:, :, y, x, :, :] = img[:, :, y:y+filter_h, x:x+filter_w]
            
    # check1
    print('col.shape after slicing = ', col.shape)
    show(col.reshape(576, 1, 5, 5), x = 3.5, y = 3.5, nx = 24)
    
    #Transpose & Reshape
    col = col.transpose(0, 2, 3, 1, 4, 5).reshape(N*out_h*out_w, -1)
    
    # check2
    print('col.shape after transpose & reshape = ', col.shape)
    show(col.reshape(1, 1, 25, 576), x = 18, y =3, nx=1)
    
    return col

def im2col(input_data, filter_h, filter_w, stride=1, pad=0):
    N, C, H, W = input_data.shape
    out_h = (H + 2*pad - filter_h)//stride + 1  #Output height
    out_w = (W + 2*pad - filter_w)//stride + 1  #Output width
    img = np.pad(input_data, [(0,0), (0,0), (pad, pad), (pad, pad)], 'constant')  #Image padding
    col = np.zeros((N, C, filter_h, filter_w, out_h, out_w))  #col Matrix preparation for calculation

    # 24*24 filters 5*Slicing 5 times(stride=When 1)
    for y in range(filter_h):
        y_max = y + stride*out_h
        for x in range(filter_w):
            x_max = x + stride*out_w
            col[:, :, y, x, :, :] = img[:, :, y:y_max:stride, x:x_max:stride]
            
    # check1
    print('col.shape after slicing = ', col.shape)
    show(col.reshape(25, 1, 24, 24), x = 3.5, y = 3.5, nx = 5)
            
    #Transpose & Reshape
    col = col.transpose(0, 4, 5, 1, 2, 3).reshape(N*out_h*out_w, -1)
    
    # check2
    print('col.shape after transpose & reshape = ', col.shape)
    show(col.reshape(1, 1, 25, 576), x = 18, y =3, nx=1)
    
    return col

#Read MNIST data
(x_train, t_train), (x_test, t_test) = load_mnist(flatten=False)
x_train = x_train[5:6]  #Select the 5th data from the beginning

out1 = my_im2col(x_train, 5, 5)
out2 = im2col(x_train, 5, 5)
print('all elements are same = ', (out1 == out2).all()) #Whether all elements are equal

スクリーンショット 2020-05-10 19.20.10.png スクリーンショット 2020-05-10 19.20.24.png The first half is the result of my_im2col and the second half is the result of ʻim2col`. In both cases, col.shape after transpose & reshape is actually 576 rows x 25 columns, which is vertically long, but it does not look good, so the display is horizontally long with 25 rows x 576 columns by swapping rows and columns. I will. And if you look at the two images, they certainly have the same pattern.

In the last line of the output, ** True ** when comparing all the elements of the multidimensional array **, so we can see that my_im2col and ʻim2col` give exactly the same result. In general, if stride = 1, you only need to ** filter_h * filter_w slice **. This is a great technique, isn't it?

Here's a simple, intuitive example of why you can do this. スクリーンショット 2020-05-13 09.02.10.png

6.Pooling

# ------------- from common_layers.py -------------
    def forward(self, x):
        N, C, H, W = x.shape
        out_h = int(1 + (H - self.pool_h) / self.stride)
        out_w = int(1 + (W - self.pool_w) / self.stride)

        col = im2col(x, self.pool_h, self.pool_w, self.stride, self.pad)
        col = col.reshape(-1, self.pool_h*self.pool_w)

        arg_max = np.argmax(col, axis=1)
        out = np.max(col, axis=1)
        out = out.reshape(N, out_h, out_w, C).transpose(0, 3, 1, 2)

        self.x = x
        self.arg_max = arg_max
        return out

A diagram showing how 4D images (batch size, number of channels, image height, image width) are processed by Pooling looks like this.

スクリーンショット 2020-05-10 20.24.24.png As with convolution, the im2col function is used to get the matrix and process it with pool_h, pool_w, stride, and pad as arguments.

Deep learning / Deep learning made from scratch Chapter 7 Memo

1.First of all

2. Sample implementation

5. Comparison of two im2cols