I'm reading a masterpiece, ** "Deep Learning from Zero" **. This time is a memo of Chapter 7. To execute the code, download the entire code from Github and use jupyter notebook in ch07.
I will implement a sample to actually move ** Convolution, Pooling ** that I studied in the textbook. The data used is MNIST, and the weight of the Convolution filter is learned (params.pkl saved in the ch07 folder). The code for Convolution and Pooling is imported from common / layers.py and used.
First, let's run it.
import sys, os
sys.path.append(os.pardir) #Settings for importing files in the parent directory
import numpy as np
import matplotlib.pyplot as plt
from simple_convnet import SimpleConvNet
from common.layers import * #Layer import
from dataset.mnist import load_mnist
#Display function(FH, FW)
def show(filters):
FH, FW = filters.shape
fig = plt.figure(figsize=(FH*0.1, FW*0.1)) #Display size specification
plt.imshow(((filters)), cmap='gray')
plt.tick_params(left=False, labelleft=False, bottom=False, labelbottom=False) #Erase the axis scale / label
plt.show()
#Read MNIST data
(x_train, t_train), (x_test, t_test) = load_mnist(flatten=False)
x_train = x_train[5:6] #Select the 5th data from the beginning
#Loading learned parameters
network = SimpleConvNet() #Instantiate SimpleConvNet
network.load_params("params.pkl") #Read the entire parameter
W1, b1 = network.params['W1'][:1], network.params['b1'][:1] #Only the first one
#Layer generation
conv = Convolution(W1, b1, stride=1, pad=0)
pool = Pooling(pool_h=2, pool_w=2, stride=2, pad=0)
#Forward propagation
out1 = conv.forward(x_train) #Convolution
out2 = pool.forward(out1) #Pooling
#display
print('input.shape = ',x_train.shape)
show(x_train.reshape(28, 28))
print('filter.shape = ', W1.shape)
show(W1.reshape(5, 5))
print('convolution.shape = ', out1.shape)
show(out1.reshape(24, 24))
print('pooling.shape = ', out2.shape)
show(out2.reshape(12, 12))
The MNIST image (1, 1, 28, 28) is convolved with filter5 * 5, padding = 0, stride = 1, and the data of (1, 1, 24, 24) is further filtered2 * 2, padding = 0, Pooling is done with stride = 2 (1, 1, 12, 12) and the data is obtained.
Now let's look at the key points of the code.
3.Convolution
# ------------- from common_layers.py -------------
def forward(self, x):
FN, C, FH, FW = self.W.shape
N, C, H, W = x.shape
out_h = 1 + int((H + 2*self.pad - FH) / self.stride)
out_w = 1 + int((W + 2*self.pad - FW) / self.stride)
#① Convert image data to matrix data with im2col
col = im2col(x, FH, FW, self.stride, self.pad)
#② Reshape the filter and expand it into a two-dimensional array
col_W = self.W.reshape(FN, -1).T
#③ Calculate the output by matrix operation
out = np.dot(col, col_W) + self.b
#④ Adjust the shape of the output
out = out.reshape(N, out_h, out_w, -1).transpose(0, 3, 1, 2)
self.x = x
self.col = col
self.col_W = col_W
return out
A diagram showing how a 4D image (batch size, number of channels, image height, image width) is processed by a convolution operation looks like this. Let's take a look at the most important ** ① im2col ** function.
4.im2col
# ------------- from common_layers.py -------------
def im2col(input_data, filter_h, filter_w, stride=1, pad=0):
N, C, H, W = input_data.shape
out_h = (H + 2*pad - filter_h)//stride + 1
out_w = (W + 2*pad - filter_w)//stride + 1
img = np.pad(input_data, [(0,0), (0,0), (pad, pad), (pad, pad)], 'constant')
col = np.zeros((N, C, filter_h, filter_w, out_h, out_w))
# 24*24 filters 5*Slicing 5 times(stride=When 1)
for y in range(filter_h): #5 loops
y_max = y + stride*out_h # y_max = y + 24
for x in range(filter_w): #5 loops
x_max = x + stride*out_w # x_max = x + 24
#y to y+Up to 24,x to x+Up to 24、スライシング
col[:, :, y, x, :, :] = img[:, :, y:y_max:stride, x:x_max:stride]
col = col.transpose(0, 4, 5, 1, 2, 3).reshape(N*out_h*out_w, -1)
return col
y: y_max: stride, x: x_max: stride
means that the range from y to y_max is specified for each stride, and the range from x to x_max is specified for each stride.
With y_max = y + 24, x_max = x + 24, stride = 1, each for loop is a double loop 5 times, so you end up slicing 5 * 5 times with a ** 24 * 24 filter **.
On the other hand, considering what I learned in the textbook, it should be 24 * 24 times slicing ** with the ** 5 * 5 filter. If you turn it into a code, it will look like this.
# 5*24 with 5 filters*Slicing 24 times(stride=When 1)
for y in range(0, out_h, stride):
for x in range(0, out_w, stride):
col[:, :, y, x, :, :] = img[:, :, y:y+filter_h, x:x+filter_w]
Indeed, the total number of elements processed by the ** 24 * 24 filter 5 * 5 times slicing ** and the ** 5 * 5 filter 24 * 24 times slicing ** is the same. If the results are the same, which is better? Of course, it is ** the former **. The reason is that the number of for loops that take a long time to process is overwhelmingly small.
If you show the two methods in the figure, it looks like this
Let's check if the results of both are really the same. Execute the following code to visualize the original ʻim2coland the function
my_im2col` that slices ** 24 * 24 times with a ** 5 * 5 filter, including the data being calculated.
import sys, os
sys.path.append(os.pardir) #Settings for importing files in the parent directory
import numpy as np
from dataset.mnist import load_mnist
import matplotlib.pyplot as plt
#Data display function( x =Display width, y=Display height, nx =Number of columns)
def show(filters, x, y, nx, margin=1, scale=10):
FN, C, FH, FW = filters.shape
ny = int(np.ceil(FN / nx))
fig = plt.figure(figsize=(x, y))
fig.subplots_adjust(left=0, right=1.3, bottom=0, top=1.3, hspace=0.05, wspace=0.05)
for i in range(FN):
ax = fig.add_subplot(ny, nx, i+1, xticks=[], yticks=[])
ax.imshow(filters[i, 0], cmap='gray', interpolation='nearest')
plt.show()
def my_im2col(input_data, filter_h, filter_w, stride=1, pad=0):
N, C, H, W = input_data.shape
out_h = (H + 2*pad - filter_h)//stride + 1 #Output height
out_w = (W + 2*pad - filter_w)//stride + 1 #Output width
img = np.pad(input_data, [(0,0), (0,0), (pad, pad), (pad, pad)], 'constant') #Image padding
col = np.zeros((N, C, out_h, out_w, filter_h, filter_w)) #col Matrix preparation for calculation
# 5*24 with 5 filters*Slicing 24 times(stride=When 1)
for y in range(0, out_h, stride):
for x in range(0, out_w, stride):
col[:, :, y, x, :, :] = img[:, :, y:y+filter_h, x:x+filter_w]
# check1
print('col.shape after slicing = ', col.shape)
show(col.reshape(576, 1, 5, 5), x = 3.5, y = 3.5, nx = 24)
#Transpose & Reshape
col = col.transpose(0, 2, 3, 1, 4, 5).reshape(N*out_h*out_w, -1)
# check2
print('col.shape after transpose & reshape = ', col.shape)
show(col.reshape(1, 1, 25, 576), x = 18, y =3, nx=1)
return col
def im2col(input_data, filter_h, filter_w, stride=1, pad=0):
N, C, H, W = input_data.shape
out_h = (H + 2*pad - filter_h)//stride + 1 #Output height
out_w = (W + 2*pad - filter_w)//stride + 1 #Output width
img = np.pad(input_data, [(0,0), (0,0), (pad, pad), (pad, pad)], 'constant') #Image padding
col = np.zeros((N, C, filter_h, filter_w, out_h, out_w)) #col Matrix preparation for calculation
# 24*24 filters 5*Slicing 5 times(stride=When 1)
for y in range(filter_h):
y_max = y + stride*out_h
for x in range(filter_w):
x_max = x + stride*out_w
col[:, :, y, x, :, :] = img[:, :, y:y_max:stride, x:x_max:stride]
# check1
print('col.shape after slicing = ', col.shape)
show(col.reshape(25, 1, 24, 24), x = 3.5, y = 3.5, nx = 5)
#Transpose & Reshape
col = col.transpose(0, 4, 5, 1, 2, 3).reshape(N*out_h*out_w, -1)
# check2
print('col.shape after transpose & reshape = ', col.shape)
show(col.reshape(1, 1, 25, 576), x = 18, y =3, nx=1)
return col
#Read MNIST data
(x_train, t_train), (x_test, t_test) = load_mnist(flatten=False)
x_train = x_train[5:6] #Select the 5th data from the beginning
out1 = my_im2col(x_train, 5, 5)
out2 = im2col(x_train, 5, 5)
print('all elements are same = ', (out1 == out2).all()) #Whether all elements are equal
The first half is the result of my_im2col
and the second half is the result of ʻim2col`. In both cases, col.shape after transpose & reshape is actually 576 rows x 25 columns, which is vertically long, but it does not look good, so the display is horizontally long with 25 rows x 576 columns by swapping rows and columns. I will. And if you look at the two images, they certainly have the same pattern.
In the last line of the output, ** True ** when comparing all the elements of the multidimensional array **, so we can see that my_im2col
and ʻim2col` give exactly the same result. In general, if stride = 1, you only need to ** filter_h * filter_w slice **. This is a great technique, isn't it?
Here's a simple, intuitive example of why you can do this.
6.Pooling
# ------------- from common_layers.py -------------
def forward(self, x):
N, C, H, W = x.shape
out_h = int(1 + (H - self.pool_h) / self.stride)
out_w = int(1 + (W - self.pool_w) / self.stride)
col = im2col(x, self.pool_h, self.pool_w, self.stride, self.pad)
col = col.reshape(-1, self.pool_h*self.pool_w)
arg_max = np.argmax(col, axis=1)
out = np.max(col, axis=1)
out = out.reshape(N, out_h, out_w, C).transpose(0, 3, 1, 2)
self.x = x
self.arg_max = arg_max
return out
A diagram showing how 4D images (batch size, number of channels, image height, image width) are processed by Pooling looks like this.
As with convolution, the im2col function is used to get the matrix and process it with pool_h, pool_w, stride, and pad as arguments.
Recommended Posts