Deep learning from scratch (forward propagation edition)

Introduction

It's been about two weeks since I started learning deep learning. It's about time I heard the sound of what I learned spilling out of my head, so I'd like to output it in an organized manner. From this time, we will build DNN multiple times. This time is the forward propagation edition.

About the DNN to be created

Build a network to determine if the image is a cat (1) or not (0).

Data to use

209 images will be used as training data and 50 images will be used as test data. The size of each image is 64 * 64.

Number of training examples : 209
Number of testing examples : 50
Each image is of size : 64 * 64

In addition, the ratio of correct answer data and incorrect answer data is as follows.

Number of cat-images of training data : 72 / 209
Number of cat-images of test data : 33 / 50

DNN to build

Number of layers

Build a four-tier network with 12288 (64 * 64) input nodes and one output node. I intended to connect each node with a line, but I gave up because it was harsh to make with PowerPoint.

DNN_aruchitecture.png

Dimention of each layer : [12288, 20, 7, 5, 1]

Activation function

This time, we will use the ReLU function for the middle layer and the sigmoid function for the output layer.

Relu function

The Relu function is a function that outputs 0 if the input value is 0 or less, and outputs the input value as it is otherwise. It can be written as follows.

y = np.maximum(0, x)

relu function.png

sigmoid function

The sigmoid function is a function that converts the input value from 1 to 0 and outputs it. It can be written as follows. The expression is as follows. $ y = \frac{1}{1+e^{-z}} $ Written in Python, it looks like this:

y = 1 / (1 + np.exp(-x))

sigmoid_function.png

The big picture of learning

Now that you know the overall design of DNN, let's review learning about DNN. In DNN, learning is performed according to the following procedure. The parameters are gradually optimized by repeating steps 2 to 5.

  1. Parameter initialization
  2. Forward Propagation
  3. Calculate the error
  4. Back Propagation of error
  5. Parameter update
  6. Return to 2

DNN_learning_cycle.png

Parameter initialization

This time, all layers were initialized using the initial values of Xivier. It seems that you should use He initialization when using the relu function, but honestly, the difference in this area is not well understood yet.

Xavier initialization

Xivier initialization randomly picks the initial parameters from a normal distribution with mean $ 0 $ and standard deviation $ \ frac {1} {\ sqrt {n}} $. Compared to the case of extracting from the standard normal distribution, extracting from a narrow range prevents the activation of each layer from being biased to around 0 and 1, and makes it difficult for gradient disappearance to occur. See page 182 of "Deep Learning from scratch" for details. Since there are 4 layers, it is necessary to initialize the parameters for 4 layers. Create a function that returns a parameter with the vectorization of the number of layers in each layer as an argument.

def initialize_parameters(layers_dims):
    np.random.seed(1)
    parameters = {}
    L = len(layers_dims)
    
    for l in range(1, L):
        parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1]) / np.sqrt(layers_dims[l-1])
        parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
    return parameters

Forward Propagation

Predict the correct label using the prepared parameters. Do the following for all layers: Since the figure used in another post is diverted, the cost function $ L (a, y) $ is drawn, but Forwad Propagation does not need to be aware of it, so ignore it.

LogisticFunction.png

The functions needed to do this are: The cache and caches that appear occasionally are variables used when implementing backpropagation.

  1. A function that calculates the inner product of the input value X and the parameter W + the bias term.
  2. activation function (sigmoid (), relu ())
  3. A function that combines 1 and 2
  4. A function that repeats 3 several times in layers

A function that calculates the inner product of the input value X and the parameter W + the bias term

def linear_forward(A, W, b):
    Z = np.dot(W, A) + b
    cache = (A, W, b)
    return Z, cache

activation function

def sigmoid(Z):
    A = 1 / (1+np.exp(-Z))
    cache = Z
    return A, cache


def relu(Z):
    A = np.maximum(0, Z)
    cache = Z
    return A, cache

A function that puts 1 and 2 together

def linear_activation_forward(A_prev, W, b, activation):
    if activation == 'relu':
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = relu(Z)
    elif activation == 'sigmoid':
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = sigmoid(Z)
    cache = (linear_cache, activation_cache)
    
    return A, cache

Function that repeats 3 for the number of layers

def L_model_forward(X, parameters):
    caches = []
    A = X
    L = len(parameters) // 2
    
    for l in range(1, L):
        A_prev = A
        A, cache = linear_activation_forward(A_prev, parameters['W'+str(l)], parameters['b'+str(l)], activation='relu')
        caches.append(cache)
    AL, cache = linear_activation_forward(A, parameters['W'+str(L)], parameters['b'+str(L)], activation = 'sigmoid')
    caches.append(cache)
    return AL, caches

Prediction with initial parameters can be performed by executing L_model_forward.

Summary

This time, we implemented up to forward propagation. Next time I would like to calculate the cost.

Recommended Posts

Deep learning from scratch (forward propagation edition)
Deep Learning from scratch
Deep learning from scratch (cost calculation)
Deep Learning memos made from scratch
Introduction to Deep Learning ~ Forward Propagation ~
[Learning memo] Deep Learning made from scratch [Chapter 7]
Deep learning / Deep learning from scratch 2-Try moving GRU
Deep learning / Deep learning made from scratch Chapter 6 Memo
[Learning memo] Deep Learning made from scratch [Chapter 5]
[Learning memo] Deep Learning made from scratch [Chapter 6]
"Deep Learning from scratch" in Haskell (unfinished)
Deep learning / Deep learning made from scratch Chapter 7 Memo
[Windows 10] "Deep Learning from scratch" environment construction
Learning record of reading "Deep Learning from scratch"
[Deep Learning from scratch] About hyperparameter optimization
"Deep Learning from scratch" Self-study memo (Part 12) Deep learning
[Learning memo] Deep Learning made from scratch [~ Chapter 4]
Write an impression of Deep Learning 3 framework edition made from scratch
"Deep Learning from scratch" Self-study memo (9) MultiLayerNet class
Deep Learning from scratch ① Chapter 6 "Techniques related to learning"
Good book "Deep Learning from scratch" on GitHub
Deep Learning from scratch Chapter 2 Perceptron (reading memo)
[Learning memo] Deep Learning from scratch ~ Implementation of Dropout ~
Python vs Ruby "Deep Learning from scratch" Summary
"Deep Learning from scratch" Self-study memo (10) MultiLayerNet class
"Deep Learning from scratch" Self-study memo (No. 11) CNN
[Deep Learning from scratch] I implemented the Affine layer
"Deep Learning from scratch" Self-study memo (No. 19) Data Augmentation
"Deep Learning from scratch 2" Self-study memo (No. 21) Chapters 3 and 4
Deep learning / LSTM scratch code
[Deep Learning from scratch] Speeding up neural networks I explained back propagation processing
Application of Deep Learning 2 made from scratch Spam filter
[Deep Learning from scratch] I tried to explain Dropout
[Deep Learning from scratch] Implementation of Momentum method and AdaGrad method
An amateur stumbled in Deep Learning from scratch Note: Chapter 1
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 5
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 2
Create an environment for "Deep Learning from scratch" with Docker
An amateur stumbled in Deep Learning from scratch Note: Chapter 3
An amateur stumbled in Deep Learning from scratch Note: Chapter 7
An amateur stumbled in Deep Learning from scratch Note: Chapter 5
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 7
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 1
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 4
"Deep Learning from scratch" self-study memo (No. 18) One! Meow! Grad-CAM!
"Deep Learning from scratch" self-study memo (No. 19-2) Data Augmentation continued
An amateur stumbled in Deep Learning from scratch Note: Chapter 4
An amateur stumbled in Deep Learning from scratch Note: Chapter 2
I tried to implement Perceptron Part 1 [Deep Learning from scratch]
"Deep Learning from scratch" self-study memo (No. 15) TensorFlow beginner tutorial
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 6
Deep Learning / Deep Learning from Zero 2 Chapter 4 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 5 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 7 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 8 Memo
Deep Learning / Deep Learning from Zero Chapter 5 Memo
Deep Learning / Deep Learning from Zero Chapter 4 Memo
Deep Reinforcement Learning 3 Practical Edition: Breakout
Deep Learning / Deep Learning from Zero 2 Chapter 3 Memo
Introduction to Deep Learning ~ Dropout Edition ~
Deep Learning / Deep Learning from Zero 2 Chapter 6 Memo