This is an article of [Deep Learning Series](#Deep Learning Series). The previous article is here. Here, we will explain the theory of forward propagation in scalar first, and then extend it to a matrix. It will be added or modified to the code introduced in the previous article, so please get the code from the previous article first ~
-[Forward propagation with scalar](#Forward propagation with scalar)
-[Forward propagation theory in scalar](#Forward propagation theory in scalar)
-[Scalar forward propagation implementation](#Scalar forward propagation implementation)
-[Forward propagation in matrix](# Forward propagation in matrix)
-[Forward propagation theory in matrix](# Forward propagation theory in matrix)
-[Forward propagation implementation in matrix](# Forward propagation implementation in matrix)
-[Implementation of __init__
method](Implementation of #init method)
-[About matrix operation](# About matrix operation)
-[Matrix sum](# matrix sum)
-[Matrix element product](#matrix element product)
-[Matrix product](# matrix product)
-Transpose
This section describes the theory and implementation of forward propagation in scalars (real numbers). However, it is almost as already mentioned in Basics.
First of all, the theory. Let's start with this neuron model. When this is formulated, $ f (x) = \ sigma (wx + b) $ is obtained as described in [here](https://qiita.com/kuroitu/items/221e8c477ffdd0774b6b#activation function). is. By passing through the activation function $ \ sigma (•) $, it is made non-linear and has the meaning of stacking layers. So what would this operation look like in a computational graph?
I feel like this. Until now, the input was only $ x $ and other elements were omitted, but in the calculation graph, ** weight $ w $ ** and ** bias (threshold) $ b $ ** are also described properly and activated. It can be rewritten as outputting through a conversion function. These variables $ x, w, b, y $ are the variables that neuron objects should have. Also, regarding activation functions, Python can store functions and classes as objects in variables, so it would be nice if the implementation could be encapsulated by having them in a neuron object. That is all for the theory of forward propagation in SCARA. It's simple and nice ~.
Let's implement it. The code to implement is [baselayer.py](https://qiita.com/kuroitu/items/884c62c48c2daa3def08#layer module code preparation).
baselayer.py
def forward(self, x):
"""
Implementation of forward propagation
"""
#Remember your input
self.x = x.copy()
#Forward propagation
y = self.w * x + self.b
self.y = self.act.forward(y)
return self.y
Forward propagation itself is extremely easy, isn't it? As per the formula. That's all for implementation in scalar. Next, consider a vector (rather than a matrix) implementation.
Next, consider forward propagation in a matrix. Matrix is strict if you don't know the concept of matrix multiplication of linear algebra, so if you don't know it, I will explain it briefly in [here](# about matrix operation).
First, let's think of a layer object that looks like a stack of two neuron objects. I couldn't think of a good expression of the figure, so I will explain it for a moment. First of all, it is OK to understand that the black arrows are neuron objects. It's the arrows of other colors. The light blue arrow </ font> represents the synapse that connects the upper neuron to the lower neuron. Multiply the light blue weight $ w_ {1, 2} $ </ font> as you pass through the middle multiplication node and join the lower addition node. The same goes for the red arrow </ font>. Multiply the red weight $ w_ {2, 1} $ </ font> as you pass through the middle multiplication node and join it to the upper addition node. The input of the addition node has become three, but it can be decomposed into two inputs by using multiple addition nodes, so please consider that it is omitted. Let's follow this with a mathematical formula. In the following, for simplicity, $ \ sigma_i (•) $ is an identity function. First of all, I will write it down.
y_1 = w_{1, 1}x_1 + w_{2, 1}x_2 + b_1 \\
y_2 = w_{1, 2}x_1 + w_{2, 2}x_2 + b_2
I think this itself is obvious if you look at the figure. Let's express this in a matrix representation.
\left(
\begin{array}{c}
y_1 \\
y_2
\end{array}
\right)
=
\left(
\begin{array}{cc}
w_{1, 1} & w_{2, 1} \\
w_{1, 2} & w_{2, 2}
\end{array}
\right)
\left(
\begin{array}{c}
x_1 \\
x_2
\end{array}
\right)
+
\left(
\begin{array}{c}
b_1 \\
b_2
\end{array}
\right)
I feel like this. If you understand matrix multiplication, you will find that it is an equivalent expression. By the way, regarding $ w_ {i, j} $, pay attention to the subscripts. You usually read subscripts like ** $ i $ row $ j $ column **, right? However, in the above formula, it looks like ** $ j $ row $ i $ column **. Let's transpose it as it is because it is difficult to handle theoretically and implementationally.
\left(
\begin{array}{c}
y_1 \\
y_2
\end{array}
\right)
=
\left(
\begin{array}{cc}
w_{1, 1} & w_{1, 2} \\
w_{2, 1} & w_{2, 2}
\end{array}
\right)^{\top}
\left(
\begin{array}{c}
x_1 \\
x_2
\end{array}
\right)
+
\left(
\begin{array}{c}
b_1 \\
b_2
\end{array}
\right) \\
\Leftrightarrow
\boldsymbol{Y} = \boldsymbol{W}^{\top}\boldsymbol{X} + \boldsymbol{B}
Transpose is also mentioned in here. For the time being, this completes the mathematical expression of the layer object with 2 inputs and 2 outputs. It's easy. Let's generalize this. That doesn't change. Mathematically
\boldsymbol{Y} = \boldsymbol{W}^{\top}\boldsymbol{X} + \boldsymbol{B}
It remains. Let's take a closer look at this. Considering the $ M $ input $ N $ output layer
\underbrace{\boldsymbol{Y}}_{N \times 1} = \underbrace{\boldsymbol{W}^{\top}}_{N \times M}\underbrace{\boldsymbol{X}}_{M \times 1} + \underbrace{\boldsymbol{B}}_{N \times 1}
It looks like. $ \ Boldsymbol {W} ^ {\ top} $ shows the shape after transposition. Before transposition, it's $ \ underbrace {\ boldsymbol {W}} _ {M \ times N} $. That is all for the theory. Now let's move on to implementation.
The implementation location is [baselayer.py](https://qiita.com/kuroitu/items/884c62c48c2daa3def08#layer module code preparation) as in the scalar implementation. Rewrite the implementation in scalar.
baselayer.py
def forward(self):
"""
Implementation of forward propagation
"""
#Remember your input
self.x = x.copy()
#Forward propagation
y = self.w.T @ x + self.b
self.y = self.act.forward(y)
return self.y
baselayer.py
y = self.w * x + self.b
But
baselayer.py
y = self.w.T @ x + self.b
It has become.
For numpy arrays, transpose can be done with ndarray.T
.
The @
operator may be unfamiliar to some, but it is the same as np.dot
.
** It is available in Numpy version 1.10 and above, so please be careful if you are using a lower version. ** **
test_at.py
x = np.array([1, 2])
w = np.array([[1, 0], [0, 1]])
b = np.array([1, 1])
y = w.T @ x + b
print(y)
print(y == np.dot(w.T, x) + b)
#----------
#The output is
# [2 3]
# [ True True]
#It will be.
In fact, the operation function of the matrix and vector of the @
operator is implicitly used here. If you want to operate as a matrix properly, you may need to do reshape
instead of np.matrix
instead of np.array
.
test_at.py
x = np.matrix([1, 2]).reshape(2, -1)
w = np.matrix([[1, 0], [0, 1]])
b = np.matrix([1, 1]).reshape(2, -1)
y = w.T @ x + b
print(y)
print(y == np.dot(w.T, x) + b)
#----------
#The output is
# [[2]
# [3]]
# [[ True]
# [ True]]
#It will be.
It's annoying, isn't it? You can use np.array
.
__init__
methodBy the way, there are some members that the layer object should have. Let's implement this. In addition, I will have some members.
baselayer.py
def __init__(self, *, prev=1, n=1,
name="", wb_width=1,
act="ReLU",
**kwds):
self.prev = prev #Number of outputs of the previous layer=Number of inputs to this layer
self.n = n #Number of outputs in this layer=Number of inputs to the next layer
self.name = name #The name of this layer
#Set weight and bias
self.w = wb_width*np.random.randn(prev, n)
self.b = wb_width*np.random.randn(n)
#Activation function(class)Get
self.act = get_act(act)
Here, we will briefly introduce matrix operations. Please note that I will not explain anything mathematically (I can not do it) just because it is calculated like this.
First is the matrix sum.
\left(
\begin{array}{cc}
a & b \\
c & d
\end{array}
\right)
+
\left(
\begin{array}{cc}
A & B \\
C & D
\end{array}
\right)
=
\left(
\begin{array}{cc}
a + A & b + B \\
c + C & d + D
\end{array}
\right)
Well, it's a natural result. Add for each element. Not to mention addition ** The shape of the matrix must match exactly. ** **
I will also touch on the element product that is not mentioned at all in this article. The element product is also called the Hadamard product.
\left(
\begin{array}{cc}
a & b \\
c & d
\end{array}
\right)
\otimes
\left(
\begin{array}{cc}
A & B \\
C & D
\end{array}
\right)
=
\left(
\begin{array}{cc}
aA & bB \\
cC & dD
\end{array}
\right)
By the way, the Hadamard product symbol can be written as \ otimes
. There are many other symbol-related items on here.
This is a matrix product, not an element product. It is a big difference from the element product that there are restrictions on the shape and that it is not commutative.
\left(
\begin{array}{cc}
a & b \\
c & d
\end{array}
\right)
\left(
\begin{array}{cc}
A & B \\
C & D
\end{array}
\right)
=
\left(
\begin{array}{cc}
aA + bC & aB + bD \\
cA + dC & cB + dD
\end{array}
\right)
As a calculation method, it feels like rolling horizontally $ \ times $. It is like this. In order to be able to calculate in this way, the number of elements of ** horizontal ** </ font> and the number of elements of ** vertical ** </ font> Must match. In other words, the ** column ** </ font> of the first matrix and the ** row ** </ font> of the second matrix are one. You need to do it. Generalized, the result of the matrix product of the ** $ L \ times M $ matrix and the $ M \ times N $ matrix is $ L \ times N $. ** **
Transpose is the operation of exchanging rows and columns of a matrix.
\left(
\begin{array}{cc}
a & b & c \\
d & e & f
\end{array}
\right)^{\top}
=
\left(
\begin{array}{cc}
a & d \\
b & e \\
c & f
\end{array}
\right)
The transpose symbol is written as \ top
.
This operation only needs this explanation.
This completes the forward propagation implementation. In particular, it may be necessary to perform different processing between the intermediate layer and the output layer, so it is necessary to override it in [middlelayer.py and outputlayer.py](https://qiita.com/kuroitu/items/884c62c48c2daa3def08#layer module code preparation). There is no. Forward propagation is easy and nice.
-How to color with Qiita markdown [140 colors] -Vector notation of mathematical formula description in Qiita -How to write mathematical formulas that often appear in books such as machine learning in Qiita
-Introduction to Deep Learning ~ Basics ~ -Introduction to Deep Learning ~ Coding Preparation ~ -Thorough understanding of im2col -List of activation functions (2020)
Recommended Posts