The previous article is here
DNN (Deep Neural Network) is completed by the last time.
(I plan to play with DNN in another article, including how to use the layer manager)
Here, we will create a CNN (Convolutional Neural Network) for image recognition.
The ʻim2coland
col2im` functions used here are here and here. ) Is introduced.
The next article is here
-[Convolution layer](# Convolution layer) -[Convolution layer forward propagation](# Convolution layer forward propagation) -[Convolution layer backpropagation](# Convolution layer back propagation) -[Convolution layer learning](# Convolution layer learning) -[Convolution layer mounting](# Convolution layer mounting) -[Pooling layer](# pooling layer) -[Polling layer forward propagation](# Pooling layer forward propagation) -[Pooling layer back propagation](# Pooling layer back propagation) -[Pooling layer learning](# Pooling layer learning) -[Pooling layer mounting](# Pooling layer mounting) -Conclusion
A process called ** convolution ** gives a great benefit to image recognition. As an introduction, for data such as images where the positional relationship seems to be important, simply smoothing it to a neural network in one dimension and flowing it is like throwing away the important information of the positional relationship, which is a waste. It's like that. The role of the convolution layer is to flow data through the neural network while maintaining the dimensions of the input, that is, while maintaining important information such as positional relationships. For convolution layers, this filter corresponds to the weights of ordinary layers. After that, you can write the code that works according to this gif, but in fact, if you implement it as it is, it will be a heavy code that is not practical. Because if you simplify and implement this gif part
Image = I_h×I_array of w
Filter = F_h×F_array of w
Output = O_h×O_array of w
for h in range(O_h):
h_lim = h + F_h
for w in range(O_w):
w_lim = w + F_w
Output[h, w] = Image[h:h_lim, w:w_lim] * Filter
It becomes like, access the numpy array in a double loop, apply the element product to the corresponding part of the input, and save the result in the output, and so on.
Moreover, this loop, here a double loop, is a quadruple loop because the actual input is four-dimensional. It's easy to imagine that the number of loops will increase rapidly.
Since numpy has a specification that it will be slow if you access it with the for
statement, you want to avoid accessing it in a loop as much as possible. That's where the ʻim2col` function comes into play.
The previous gif is
a = 1W + 2X + 5Y + 6Z \\
b = 2W + 3X + 6Y + 7Z \\
c = 3W + 4X + 7Y + 8Z \\
d = 5W + 6X + 9Y + 10Z \\
e = 6W + 7X + 10Y + 11Z \\
f = 7W + 8X + 11Y + 12Z \\
g = 9W + 10X + 13Y + 14Z \\
h = 10W + 11X + 14Y + 15Z \\
i = 11W + 12X + 15Y + 16Z
It is like that, but if you express this as a matrix product
\left(
\begin{array}{c}
a \\
b \\
c \\
d \\
e \\
f \\
g \\
h \\
i
\end{array}
\right)^{\top}
=
\left(
\begin{array}{cccc}
W & X & Y & Z
\end{array}
\right)
\left(
\begin{array}{ccccccccc}
1 & 2 & 3 & 5 & 6 & 7 & 9 & 10 & 11 \\
2 & 3 & 4 & 6 & 7 & 8 & 10 & 11 & 12 \\
5 & 6 & 7 & 9 & 10 & 11 & 13 & 14 & 15 \\
6 & 7 & 8 & 10 & 11 & 12 & 14 & 15 & 16
\end{array}
\right)
It will be. The ʻim2col function is a function for converting an input image or filter into a matrix like this. For details, see [here](https://qiita.com/kuroitu/items/35d7b5a4bde470f69570). By the way, by using this ʻim2col
function, the above problem can be solved considerably. However, of course, if the ʻim2colfunction is used, the shape of the original input will be different, so learning by the error back propagation method cannot proceed as it is. So you need to bite the
col2im` function, which does the opposite, during backpropagation. For details, see here.
So far, I have briefly explained the outline of the convolution layer, so I will show you the blueprint.
Let's start with forward propagation. The relevant part is the color part in the figure below. Operationally
The basic operation is the same as the forward propagation of a normal neural network. The only difference is that the ʻim2colfunction is inserted before that. Let's take a closer look. First, the convolution operation is as shown in the figure below. ![conv_filtering.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/640911/6d06ad5e-550c-2e59-7906-04989b5269a5.png) Bias is omitted. Inputs are a tensor with batch size $ B $, number of channels $ C $, and image size $ (I_h, I_w) $. There are $ M $ filters for each channel, which is a tensor with the same number of channels as the input and a filter size of $ (F_h, F_w) $. The channel filter corresponding to each input channel is filtered for all batch data, resulting in a tensor with the shape $ (B, M, O_h, O_w) $. Let's see how to do this process concretely. Process the inputs and filters as shown in the figure below. ![input_im2col.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/640911/f22aa967-6879-1cc1-41e2-d83ceeb956f3.png) ![filter_reshape.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/640911/d550d4be-8531-1986-aaa9-b3352e1008bb.png) This allowed us to drop the 4D tensor into 2D, which allows us to perform matrix multiplication. ![convolution.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/640911/f763ca02-5d32-440a-44ec-5ca51817db9e.png) Bias (the shape is a two-dimensional matrix of $ (M, 1) $) is applied to this output. At this time, use the broadcast function of
numpy` to add the same value to all columns.
After that, this output is transformed and the dimensions are exchanged to obtain the output tensor.
Throw this output tensor into the activation function to complete the forward propagation of the convolution layer.
Next is back propagation. The related part is the color part in the figure below. As an operation
col2im
function.It's like that.
Let's take a closer look.
The propagated gradient is a $ (B, M, O_h, O_w) $ tensor. First, this gradient is transformed in the reverse order of forward propagation.
The gradient to the filter is calculated by the gradient and the input matrix product.
Since the obtained result is a two-dimensional matrix, it is transformed into a four-dimensional tensor with the same shape as the filter.
The key to the gradient to bias is adding the same value to all columns during forward propagation. Adding the same value to several elements shows that it is equivalent to a network shaped as shown in the figure below.
(The number is appropriate)
Therefore, $ axis = 1 $, that is, backpropagation is performed from each column direction toward one bias, and the sum of them is the gradient to the bias.
The gradient to the input is calculated by the matrix product of the filter and the gradient.
As you can see by looking at the tensor of the calculation result, it is the same as the result of throwing the input tensor to the ʻim2colfunction when the shape is forward propagation. So throwing this into the
col2im` function, which does the opposite, forms a gradient tensor to the input.
This completes the backpropagation of the convolution layer.
Well, you don't actually have to transform the filter every time. You only have to do it once at the beginning. The reason is that "the filter deforms the same every time, so there is no need to repeat it." The filter remains as it is after the first deformation, which means that the gradient to the filter calculated by backpropagation does not need to be deformed either. As such, the learning of the convolution layer is the same as the normal layer.
So I will implement it. However, a little ingenuity is required to inherit BaseLayer
.
conv.py
import numpy as np
class ConvLayer(BaseLayer):
def __init__(self, *, I_shape=None, F_shape=None,
stride=1, pad="same",
name="", wb_width=5e-2,
act="ReLU", opt="Adam",
act_dic={}, opt_dic={}, **kwds):
self.name = name
if I_shape is None:
raise KeyError("Input shape is None.")
if F_shape is None:
raise KeyError("Filter shape is None.")
if len(I_shape) == 2:
C, I_h, I_w = 1, *I_shape
else:
C, I_h, I_w = I_shape
self.I_shape = (C, I_h, I_w)
if len(F_shape) == 2:
M, F_h, F_w = 1, *F_shape
else:
M, F_h, F_w = F_shape
self.F_shape = (M, C, F_h, F_w)
if isinstance(stride, tuple):
stride_ud, stride_lr = stride
else:
stride_ud = stride
stride_lr = stride
self.stride = (stride_ud, stride_lr)
if isinstance(pad, tuple):
pad_ud, pad_lr = pad
elif isinstance(pad, int):
pad_ud = pad
pad_lr = pad
elif pad == "same":
pad_ud = 0.5*((I_h - 1)*stride_ud - I_h + F_h)
pad_lr = 0.5*((I_w - 1)*stride_lr - I_w + F_w)
self.pad = (pad_ud, pad_lr)
O_h = get_O_shape(I_h, F_h, stride_ud, pad_ud)
O_w = get_O_shape(I_w, F_w, stride_lr, pad_lr)
self.O_shape = (M, O_h, O_w)
self.n = np.prod(self.O_shape)
#Set filters and bias
self.w = wb_width*np.random.randn(*self.F_shape).reshape(M, -1).T
self.b = wb_width*np.random.randn(M)
#Activation function(class)Get
self.act = get_act(act, **act_dic)
#Optimizer(class)Get
self.opt = get_opt(opt, **opt_dic)
def forward(self, x):
B = x.shape[0]
M, O_h, O_w = self.O_shape
x, _, self.pad_state = im2col(x, self.F_shape,
stride=self.stride,
pad=self.pad)
super().forward(x.T)
return self.y.reshape(B, O_h, O_w, M).transpose(0, 3, 1, 2)
def backward(self, grad):
B = grad.shape[0]
I_shape = B, *self.I_shape
M, O_h, O_w = self.O_shape
grad = grad.transpose(0, 2, 3, 1).reshape(-1, M)
super().backward(grad)
self.grad_x = col2im(self.grad_x.T, I_shape, self.O_shape,
stride=self.stride, pad=self.pad_state)
return self.grad_x
I will explain which area you are devising. If you implement it as explained above without any ingenuity, it will be as follows.
conv.py
import numpy as np
class ConvLayer(BaseLayer):
def __init__(self, *, I_shape=None, F_shape=None,
stride=1, pad="same",
name="", wb_width=5e-2,
act="ReLU", opt="Adam",
act_dic={}, opt_dic={}, **kwds):
self.name = name
if I_shape is None:
raise KeyError("Input shape is None.")
if F_shape is None:
raise KeyError("Filter shape is None.")
if len(I_shape) == 2:
C, I_h, I_w = 1, *I_shape
else:
C, I_h, I_w = I_shape
self.I_shape = (C, I_h, I_w)
if len(F_shape) == 2:
M, F_h, F_w = 1, *F_shape
else:
M, F_h, F_w = F_shape
self.F_shape = (M, C, F_h, F_w)
_, O_shape, self.pad_state = im2col(np.zeros((1, *self.I_shape)), self.F_shape,
stride=stride, pad=pad)
self.O_shape = (M, *O_shape)
self.stride = stride
self.n = np.prod(self.O_shape)
#Set filters and bias
self.w = wb_width*np.random.randn(*self.F_shape).reshape(M, -1)
self.b = wb_width*np.random.randn(M, 1)
#Activation function(class)Get
self.act = get_act(act, **act_dic)
#Optimizer(class)Get
self.opt = get_opt(opt, **opt_dic)
def forward(self, x):
B = x.shape[0]
M, O_h, O_w = self.O_shape
self.x, _, self.pad_state = im2col(x, self.F_shape,
stride=self.stride,
pad=self.pad)
self.u = [email protected] + self.b
self.u = self.u.reshape(M, B, O_h, O_w).transpose(1, 0, 2, 3)
self.y = self.act.forward(self.u)
return self.y
def backward(self, grad):
B = grad.shape[0]
I_shape = B, *self.I_shape
_, O_h, O_w = self.O_shape
dact = grad*self.act.backward(self.u, self.y)
dact = dact.transpose(1, 0, 2, 3).reshape(M, -1)
self.grad_w = [email protected]
self.grad_b = np.sum(dact, axis=1).reshape(M, 1)
self.grad_x = self.w.T@dact
self.grad_x = col2im(self.grad_x, I_shape, self.O_shape,
stride=self.stride, pad=self.pad_state)
return self.grad_x
Let's take a closer look at the differences from BaseLayer
, omitting the code.
Attention part | BaseLayer |
shape | ConvLayer |
shape | |
---|---|---|---|---|---|
w | randn(prev, n) | randn(*F_shape).reshape(M, -1) | |||
b | randn(n) | randn(M, 1) | |||
x | - | im2col(x) | |||
u | x@w + b | w@x + b | |||
u | - | - | u.reshape(M, B, O_h, O_w).transpose(1, 0, 2, 3) | ||
y | act.forward(u) | act.forward(u) | |||
grad | - | - | |||
dact | grad*act.backward(u, y) | grad*act.backward(u, y) | |||
dact | - | - | dact.transpose(1, 0, 2, 3).reshape(M, -1) | ||
grad_w | x.T@dact | [email protected] | |||
grad_b | sum(dact, axis=0) | sum(dact, axis=1).reshape(M, 1) | |||
grad_x | [email protected] | w.T@dact | |||
grad_x | - | - | col2im(grad_x) |
First, let's align the forward propagation. The most different point of forward propagation is the calculation of ʻu`.
\boldsymbol{x}@\boldsymbol{w} + \boldsymbol{b} \quad \Leftrightarrow \quad \boldsymbol{w}@\boldsymbol{x} + \boldsymbol{b}
The matrix product can be reversed in order by setting $ \ boldsymbol {w} @ \ boldsymbol {x} = \ boldsymbol {x} ^ {\ top} @ \ boldsymbol {w} ^ {\ top} $. For,
\begin{align}
\boldsymbol{x} &\leftarrow \textrm{im2col}(\boldsymbol{x})^{\top} = (BO_hO_w, CF_hF_w) \\
\boldsymbol{w} &\leftarrow \boldsymbol{w}^{\top} = (CF_hF_w, M) \\
\boldsymbol{b} & \leftarrow (M, )
\end{align}
It is possible to align with the forward propagation formula by setting as. Also, regarding bias, in order to enable the broadcast function of numpy
, it is possible to make a one-dimensional array instead of a two-dimensional matrix with $ (M, 1) $.
If you change the forward propagation like this
\boldsymbol{x}@\boldsymbol{w} + \boldsymbol{b} = (BO_hO_w, CF_hF_w)@(CF_hF_w, M) + (M) = (BO_hO_w, M)
After calculating with forward
of BaseLayer
, the propagation to the next layer is self.y.reshape (B, O_h, O_w, M) .transpose (0, 3, 1, 2 It can be transformed into $ (B, M, O_h, O_w) $ by setting)
.
Also, if you look at the code of the person who devised it, the return
statement transforms it as described above and flows it, but this leaves the shapes of ʻu and
y` as $ (BO_hO_w, M) $. This is fine as it is.
Next is back propagation. The gradient grad
is $ (B, M, O_h, O_w) $, and the element product ofgrad * act.backward (u, y)
cannot be calculated as it is.
\boldsymbol{grad} \otimes \textrm{act.backward}(\boldsymbol{u}, \boldsymbol{y}) = (B, M, O_h, O_w) \otimes (BO_hO_w, M)
So let's transform grad
and align it.
You can do it with grad.transpose (0, 2, 3, 1) .reshape (-1, M)
.
After this, if you throw it to the backward
of the BaseLayer
\begin{array}[cccc]
d\boldsymbol{dact} &= \boldsymbol{grad} \otimes \textrm{act.backward}(\boldsymbol{u}, \boldsymbol{y}) &= (BO_hO_w, M) & \\
\boldsymbol{grad_w} &= \boldsymbol{x}^{\top}@\boldsymbol{dact} &= (CF_hF_w, BO_hO_w)@(BO_hO_w, M) &= (CF_hF_w, M)\\
\boldsymbol{grad_b} &= \textrm{sum}(\boldsymbol{dact}, \textrm{axis}=0) &= (M, ) & \\
\boldsymbol{grad_x} &= \boldsymbol{dact}@\boldsymbol{w}^{\top} &= (BO_hO_w, M)@(M, CF_hF_w) &= (BO_hO_w, CF_hF_w)
\end{array}
Because it becomes
\boldsymbol{grad_x} \leftarrow \textrm{col2im}(\boldsymbol{grad_x}^{\top}) = (B, C, I_h, I_w)
If so, it's OK.
You do not need to change the BaseLayer
ʻupdate` function as described above.
Therefore, this completes the convolution layer.
Next is the pooling layer. First, the pooling layer is a layer that reduces the data size by extracting only the information that seems to be important from the input image. The important information in this case is usually the maximum or average.
Also, when implementing this, it will be faster and more efficient by using the ʻim2colfunction and the
col2im` function as well as the convolution layer.
The blueprint for the pooling layer looks like this:
Let's look at forward propagation. It is the color part that is relevant. As an operation
It's like that. There are some things to keep for backpropagation. Let's take a closer look. The target operation is as shown in the figure below. First, throw the input tensor into the ʻim2col` function and convert it to a two-dimensional matrix. Furthermore, this two-dimensional matrix is transformed. After transforming into such a vertically long matrix, add the sum in the column direction, and finally transform and swap the dimensions to complete the output. Also, you need to get the index of the maximum value before taking the column sum.
Next is back propagation. It's the color part of the number that is related. As an operation
col2im
functionIt's like that. It's hard to understand the operation with just a few words ... It looks like the following.
As you can see from the blueprint, there are no parameters to learn in the pooling layer. So I don't even learn.
The explanation of the pooling layer was much easier than that of the convolution layer. The implementation is not that complicated either.
pool.py
import numpy as np
class PoolingLayer(BaseLayer):
def __init__(self, *, I_shape=None,
pool=1, pad=0,
name="", **kwds):
self.name = name
if I_shape is None:
raise KeyError("Input shape is None.")
if len(I_shape) == 2:
C, I_h, I_w = 1, *I_shape
else:
C, I_h, I_w = I_shape
self.I_shape = (C, I_h, I_w)
_, O_shape, self.pad_state = im2col(np.zeros((1, *self.I_shape)), (pool, pool),
stride=pool, pad=pad)
self.O_shape = (C, *O_shape)
self.n = np.prod(self.O_shape)
self.pool = pool
self.F_shape = (pool, pool)
def forward(self, x):
B = x.shape[0]
C, O_h, O_w = self.O_shape
self.x, _, self.pad_state = im2col(x, self.F_shape,
stride=self.pool,
pad=self.pad_state)
self.x = self.x.T.reshape(B*O_h*O_w*C, -1)
self.max_index = np.argmax(self.x, axis=1)
self.y = np.max(self.x, axis=1).reshape(B, O_h, O_w, C).transpose(0, 3, 1, 2)
return self.y
def backward(self, grad):
B = grad.shape[0]
I_shape = B, *self.I_shape
C, O_h, O_w = self.O_shape
grad = grad.transpose(0, 2, 3, 1).reshape(-1, 1)
self.grad_x = np.zeros((grad.size, self.pool*self.pool))
self.grad_x[:, self.max_index] = grad
self.grad_x = self.grad_x.reshape(B*O_h*O_w, C*self.pool*self.pool).T
self.grad_x = col2im(self.grad_x, I_shape, self.O_shape,
stride=self.pool, pad=self.pad_state)
return self.grad_x
def update(self, **kwds):
pass
When I tried to build the experimental code of CNN, it didn't work well and I was investigating it all the time ... In conclusion, there was no problem with the convolution layer and the pooling layer, and the activation function was the problem.
The implementation of Activation function list has also been changed.
I will post the experimental code in the next article. I also changed the LayerManager
class and so on.
-Introduction to Deep Learning ~ Basics ~ -Introduction to Deep Learning ~ Coding Preparation ~ -Introduction to Deep Learning ~ Forward Propagation ~ -Introduction to Deep Learning ~ Backpropagation ~ -Introduction to Deep Learning ~ Learning Rules ~ -Introduction to Deep Learning ~ Localization and Loss Functions ~ -Introduction to Deep Learning ~ Function Approximation ~ -Introduction to Deep Learning ~ Convolution and Pooling ~ -Introduction to Deep Learning ~ CNN Experiment ~ -List of activation functions (2020) -Gradient descent method list (2020) -See and understand! Comparison of optimization methods (2020) -Thorough understanding of im2col -Col2im thorough understanding -Complete understanding of numpy.pad function
Recommended Posts