It's been about two weeks since I started learning deep learning. It's about time I heard the sound of what I learned spilling out of my head, so I'd like to output it in an organized manner. From this time, we will build DNN multiple times. This time is the forward propagation edition.
Build a network to determine if the image is a cat (1) or not (0).
209 images will be used as training data and 50 images will be used as test data. The size of each image is 64 * 64.
Number of training examples : 209
Number of testing examples : 50
Each image is of size : 64 * 64
In addition, the ratio of correct answer data and incorrect answer data is as follows.
Number of cat-images of training data : 72 / 209
Number of cat-images of test data : 33 / 50
Build a four-tier network with 12288 (64 * 64) input nodes and one output node. I intended to connect each node with a line, but I gave up because it was harsh to make with PowerPoint.
Dimention of each layer : [12288, 20, 7, 5, 1]
This time, we will use the ReLU function for the middle layer and the sigmoid function for the output layer.
The Relu function is a function that outputs 0 if the input value is 0 or less, and outputs the input value as it is otherwise. It can be written as follows.
y = np.maximum(0, x)
The sigmoid function is a function that converts the input value from 1 to 0 and outputs it. It can be written as follows. The expression is as follows.
y = 1 / (1 + np.exp(-x))
Now that you know the overall design of DNN, let's review learning about DNN. In DNN, learning is performed according to the following procedure. The parameters are gradually optimized by repeating steps 2 to 5.
This time, all layers were initialized using the initial values of Xivier. It seems that you should use He initialization when using the relu function, but honestly, the difference in this area is not well understood yet.
Xivier initialization randomly picks the initial parameters from a normal distribution with mean $ 0 $ and standard deviation $ \ frac {1} {\ sqrt {n}} $. Compared to the case of extracting from the standard normal distribution, extracting from a narrow range prevents the activation of each layer from being biased to around 0 and 1, and makes it difficult for gradient disappearance to occur. See page 182 of "Deep Learning from scratch" for details. Since there are 4 layers, it is necessary to initialize the parameters for 4 layers. Create a function that returns a parameter with the vectorization of the number of layers in each layer as an argument.
def initialize_parameters(layers_dims):
np.random.seed(1)
parameters = {}
L = len(layers_dims)
for l in range(1, L):
parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1]) / np.sqrt(layers_dims[l-1])
parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
return parameters
Predict the correct label using the prepared parameters. Do the following for all layers: Since the figure used in another post is diverted, the cost function $ L (a, y) $ is drawn, but Forwad Propagation does not need to be aware of it, so ignore it.
The functions needed to do this are: The cache and caches that appear occasionally are variables used when implementing backpropagation.
def linear_forward(A, W, b):
Z = np.dot(W, A) + b
cache = (A, W, b)
return Z, cache
def sigmoid(Z):
A = 1 / (1+np.exp(-Z))
cache = Z
return A, cache
def relu(Z):
A = np.maximum(0, Z)
cache = Z
return A, cache
def linear_activation_forward(A_prev, W, b, activation):
if activation == 'relu':
Z, linear_cache = linear_forward(A_prev, W, b)
A, activation_cache = relu(Z)
elif activation == 'sigmoid':
Z, linear_cache = linear_forward(A_prev, W, b)
A, activation_cache = sigmoid(Z)
cache = (linear_cache, activation_cache)
return A, cache
def L_model_forward(X, parameters):
caches = []
A = X
L = len(parameters) // 2
for l in range(1, L):
A_prev = A
A, cache = linear_activation_forward(A_prev, parameters['W'+str(l)], parameters['b'+str(l)], activation='relu')
caches.append(cache)
AL, cache = linear_activation_forward(A, parameters['W'+str(L)], parameters['b'+str(L)], activation = 'sigmoid')
caches.append(cache)
return AL, caches
Prediction with initial parameters can be performed by executing L_model_forward.
This time, we implemented up to forward propagation. Next time I would like to calculate the cost.
Recommended Posts