I suddenly started studying in Chapter 8 of "Deep Learning from scratch-The theory and implementation of deep learning learned with Python". It is a memo of the trip.
The execution environment is macOS Mojave + Anaconda 2019.10, and the Python version is 3.7.4. For details, refer to Chapter 1 of this memo.
(To other chapters of this memo: Chapter 1 / Chapter 2 / Chapter 3 / Chapter 4 / [Chapter 5](https://qiita. com / segavvy / items / 8707e4e65aa7fa357d8a) / Chapter 6 / Chapter 7 Chapter / Summary)
This chapter describes deep neural networks with deep layers.
We will use what we have learned to challenge the implementation of MNIST handwriting recognition in a deep network. Unfortunately, this chapter is hard because there is no source description at all.
Dropout and Adam that I learned in the previous chapter skipped the implementation, but since I will use it this time, I will clean up from here.
The Dropout layer is explained in "6.4.3 Droput" of the book, so I implemented it while looking at it.
dropout.py
# coding: utf-8
import numpy as np
class Dropout:
def __init__(self, dropout_ratio=0.5):
"""Dropout layer
Args:
dropout_ratio (float):Neuron erasure rate during learning, default 0.5。
"""
self.dropout_ratio = dropout_ratio #Elimination rate of neurons during learning
self.valid_ratio = 1.0 - self.dropout_ratio #Percentage that was used at the time of learning
self.mask = None #An array of flags indicating whether each neuron is erased
def forward(self, x, train_flg=True):
"""Forward propagation
Args:
x (numpy.ndarray):input
train_flg (bool, optional):True if learning, default is True.
Returns:
numpy.ndarray:output
"""
if train_flg:
#Generate a mask that determines the neurons to be erased during learning
self.mask = np.random.rand(*x.shape) > self.dropout_ratio
#Calculate output
return x * self.mask
else:
#Neurons are not erased during recognition, but the output is adjusted to include the erase rate during learning.
return x * self.valid_ratio
def backward(self, dout):
"""Backpropagation
Args:
dout (numpy.ndarray):Derivative value transmitted from the right layer
Returns:
numpy.ndarray:Derivative value (gradient)
"""
#Backpropagation of the differential value of the right layer only for neurons that have not been erased
assert self.mask is not None, 'Backpropagation was called without forward propagation'
return dout * self.mask
Adam used for optimization has a brief explanation in "6.1.6 Adam" of the book, but it is too easy to implement by itself. Also, I couldn't understand the algorithm well by looking at the source of the book. So, first, understand the rough mechanism with @ omiita's [2020 definitive edition] super easy-to-understand optimization algorithm-Adam and Newton's method from loss function-. Did. And the PDF of the original paper introduced in the book (reference [8] site Adam: A Method for Stochastic Optimization I implemented it while looking at the explanation of "Algorithm 1" on page 2 of (You can download it from the upper right of //arxiv.org/abs/1412.6980). Although it is in English, the explanation is about 20 lines using pseudo code, so even I, who is not good at English, was quite good at it. I also tried to follow the recommended values in this paper for the initial values of the parameters.
adam.py
# coding: utf-8
import numpy as np
class Adam:
def __init__(self, alpha=0.001, beta1=0.9, beta2=0.999):
"""Parameter optimization by Adam
Args:
alpha (float, optional):Learning factor, default 0.001。
beta1 (float, optional):Coefficients of past and present velocity in Momentum, default 0.9。
beta2 (float, optional):Past and present proportional division coefficient of learning coefficient in AdaGrad, default is 0.999。
"""
self.alpha = alpha
self.beta1 = beta1
self.beta2 = beta2
self.m = None #Speed in Momentum
self.v = None #Learning factor in AdaGrad
self.t = 0 #Time step
def update(self, params, grads):
"""Parameter update
Args:
params (dict):The dictionary of parameters to be updated, key is'W1'、'b1'Such.
grads (dict):Gradient dictionary corresponding to params
"""
#Initialization of m and v
if self.m is None:
self.m = {}
self.v = {}
for key, val in params.items():
self.m[key] = np.zeros_like(val)
self.v[key] = np.zeros_like(val)
#update
self.t += 1 #Time step addition
for key in params.keys():
#Equivalent to m update, speed update in Momentum
#Gradient of past and present beta1: 1 -Prorate by beta1
self.m[key] = \
self.beta1 * self.m[key] + (1 - self.beta1) * grads[key]
#Equivalent to updating v and updating learning coefficients in AdaGrad
#Gradient of past and present beta2: 1 -Prorate by beta2
self.v[key] = \
self.beta2 * self.v[key] + (1 - self.beta2) * (grads[key] ** 2)
#Calculation of correction values for m and v for parameter update
hat_m = self.m[key] / (1.0 - self.beta1 ** self.t)
hat_v = self.v[key] / (1.0 - self.beta2 ** self.t)
#Parameter update, last 1e-7 avoids division by zero
params[key] -= self.alpha * hat_m / (np.sqrt(hat_v) + 1e-7)
This time there are many layers, and the calculation of the output size of the convolution layer and the pooling layer comes out many times. Therefore, I added them to functions.py
as functions conv_output_size
and pool_output_size
, respectively. Other functions remain up to the previous chapter.
functions.py
# coding: utf-8
import numpy as np
def softmax(x):
"""Softmax function
Args:
x (numpy.ndarray):input
Returns:
numpy.ndarray:output
"""
#For batch processing x is(Number of batches, 10)It becomes a two-dimensional array of.
#In this case, it is necessary to calculate well for each image using broadcast.
#Here, np so that it can be shared in both 1D and 2D..max()And np.sum()Is axis=-Calculated by 1
#Keepdims so that you can broadcast as it is=True to maintain the dimension.
c = np.max(x, axis=-1, keepdims=True)
exp_a = np.exp(x - c) #Overflow measures
sum_exp_a = np.sum(exp_a, axis=-1, keepdims=True)
y = exp_a / sum_exp_a
return y
def cross_entropy_error(y, t):
"""Calculation of cross entropy error
Args:
y (numpy.ndarray):Neural network output
t (numpy.ndarray):Correct label
Returns:
float:Cross entropy error
"""
#If there is one data, shape it (make one data line)
if y.ndim == 1:
t = t.reshape(1, t.size)
y = y.reshape(1, y.size)
#Calculate the error and normalize by the number of batches
batch_size = y.shape[0]
return -np.sum(t * np.log(y + 1e-7)) / batch_size
def conv_output_size(input_size, filter_size, pad, stride):
"""Calculation of output size of convolution layer
Args:
input_size (int):The size of one side of the input (assuming the same value in the vertical and horizontal directions)
filter_size (int):The size of one side of the filter (assuming the same value in the vertical and horizontal directions)
pad (int):Padding size (assuming the same value in the vertical and horizontal directions)
stride (int):Stride width (assuming the same value in the vertical and horizontal directions)
Returns:
int:The size of one side of the output
"""
assert (input_size + 2 * pad - filter_size) \
% stride == 0, 'The output size of the convolution layer is not divisible!'
return int((input_size + 2 * pad - filter_size) / stride + 1)
def pool_output_size(input_size, pool_size, stride):
"""Calculation of output size of pooling layer
Args:
input_size (int):The size of one side of the input (assuming the same value in the vertical and horizontal directions)
pool_size (int):Pooling window size (assuming the same value in height and width)
stride (int):Stride width (assuming the same value in the vertical and horizontal directions)
Returns:
int:The size of one side of the output
"""
assert (input_size - pool_size) % stride == 0, 'The output size of the pooling layer is not divisible!'
return int((input_size - pool_size) / stride + 1)
Now that we have implemented the necessary parts, it is time to implement the network.
First, I will organize the input and output in this network.
layer | Input / output shape | Shape at the time of mounting |
---|---|---|
$ (Batch size N,Number of channels CH,Image height H,Width W) $ | $ (100, 1, 28, 28) $ | |
[1] Convolution #1 | ↓ | |
$ (Batch size N,Number of filters FN,Output height OH,Width OW) $ | $ (100, 16, 28, 28) $ | |
[2] ReLU #1 | ↓ | |
$ (Batch size N,Number of filters FN,Output height OH,Width OW) $ | $ (100, 16, 28, 28) $ | |
[3] Convolution #2 | ↓ | |
$ (Batch size N,Number of filters FN,Output height OH,Width OW) $ | $ (100, 16, 28, 28) $ | |
[4] ReLU #2 | ↓ | |
$ (Batch size N,Number of filters FN,Output height OH,Width OW) $ | $ (100, 16, 28, 28) $ | |
[5] Pooling #1 | ↓ | |
$ (Batch size N,Number of filters FN,Output height OH,Width OW) $ | $ (100, 16, 14, 14) $ | |
[6] Convolution #3 | ↓ | |
$ (Batch size N,Number of filters FN,Output height OH,Width OW) $ | $ (100, 32, 14, 14) $ | |
[7] ReLU #3 | ↓ | |
$ (Batch size N,Number of filters FN,Output height OH,Width OW) $ | $ (100, 32, 14, 14) $ | |
[8] Convolution #4 | ↓ | |
$ (Batch size N,Number of filters FN,Output height OH,Width OW) $ | $ (100, 32, 16, 16) $ | |
[9] ReLU #4 | ↓ | |
$ (Batch size N,Number of filters FN,Output height OH,Width OW) $ | $ (100, 32, 16, 16) $ | |
[10] Pooling #2 | ↓ | |
$ (Batch size N,Number of filters FN,Output height OH,Width OW) $ | $ (100, 32, 8, 8) $ | |
[11] Convolution #5 | ↓ | |
$ (Batch size N,Number of filters FN,Output height OH,Width OW) $ | $ (100, 64, 8, 8) $ | |
[12] ReLU #5 | ↓ | |
$ (Batch size N,Number of filters FN,Output height OH,Width OW) $ | $ (100, 64, 8, 8) $ | |
[13] Convolution #6 | ↓ | |
$ (Batch size N,Number of filters FN,Output height OH,Width OW) $ | $ (100, 64, 8, 8) $ | |
[14] ReLU #6 | ↓ | |
$ (Batch size N,Number of filters FN,Output height OH,Width OW) $ | $ (100, 64, 8, 8) $ | |
[15] Pooling #3 | ↓ | |
$ (Batch size N,Number of filters FN,Output height OH,Width OW) $ | $ (100, 64, 4, 4) $ | |
[16] Affine #1 | ↓ | |
$ (Batch size N,Hidden layer size) $ | $ (100, 50) $ | |
[17] ReLU #7 | ↓ | |
$ (Batch size N,Hidden layer size) $ | $ (100, 50) $ | |
[18] Dropout #1 | ↓ | |
$ (Batch size N,Hidden layer size) $ | $ (100, 50) $ | |
[19] Affine #2 | ↓ | |
$ (Batch size N,Hidden layer size) $ | $ (100, 10) $ | |
[20] Dropout #2 | ↓ | |
$ (Batch size N,Hidden layer size) $ | $ (100, 10) $ | |
[21] Softmax | ↓ | |
$ (Batch size N,Final output size) $ | $ (100, 10) $ |
It's a magnificent table, but we'll implement it layer by layer.
The code in the book is simply organized using loops, but the calculation of the I / O size for each layer can be confusing, so I implemented parameter initialization and layer generation one layer at a time. It's a pretty muddy code. The parameters are initialized with "Initial value of He".
deep_conv_net.py
# coding: utf-8
import numpy as np
from affine import Affine
from convolution import Convolution
from dropout import Dropout
from functions import conv_output_size, pool_output_size
from pooling import Pooling
from relu import ReLU
from softmax_with_loss import SoftmaxWithLoss
class DeepConvNet:
def __init__(
self, input_dim=(1, 28, 28),
conv_param_1={
'filter_num': 16, 'filter_size': 3, 'pad': 1, 'stride': 1
},
conv_param_2={
'filter_num': 16, 'filter_size': 3, 'pad': 1, 'stride': 1
},
conv_param_3={
'filter_num': 32, 'filter_size': 3, 'pad': 1, 'stride': 1
},
conv_param_4={
'filter_num': 32, 'filter_size': 3, 'pad': 2, 'stride': 1
},
conv_param_5={
'filter_num': 64, 'filter_size': 3, 'pad': 1, 'stride': 1
},
conv_param_6={
'filter_num': 64, 'filter_size': 3, 'pad': 1, 'stride': 1
},
hidden_size=50, output_size=10
):
"""Deep convolutional neural network
Args:
input_dim (tuple, optional):Input data shape, default is(1, 28, 28)。
conv_param_1 (dict, optional):Hyperparameters of convolution layer 1,
The default is{'filter_num':16, 'filter_size':3, 'pad':1, 'stride':1}。
conv_param_2 (dict, optional):Hyperparameters of convolution layer 2,
The default is{'filter_num':16, 'filter_size':3, 'pad':1, 'stride':1}。
conv_param_3 (dict, optional):Hyperparameters of convolution layer 3,
The default is{'filter_num':32, 'filter_size':3, 'pad':1, 'stride':1}。
conv_param_4 (dict, optional):Hyperparameters of convolution layer 4,
The default is{'filter_num':32, 'filter_size':3, 'pad':2, 'stride':1}。
conv_param_5 (dict, optional):Hyperparameters of convolution layer 5,
The default is{'filter_num':64, 'filter_size':3, 'pad':1, 'stride':1}。
conv_param_6 (dict, optional):Hyperparameters of convolution layer 6,
The default is{'filter_num':64, 'filter_size':3, 'pad':1, 'stride':1}。
hidden_size (int, optional):The number of neurons in the hidden layer, the default is 50.
output_size (int, optional):The number of neurons in the output layer, the default is 10.
"""
assert input_dim[1] == input_dim[2], 'Input data is assumed to have the same height and width!'
#Parameter initialization and layer generation
self.params = {} #parameter
self.layers = {} #Layer (Python 3).OrderedDict is not required because the dictionary storage order is retained from 7)
#Input size
channel_num = input_dim[0] #Number of input channels
input_size = input_dim[1] #Input size
# [1]Convolution layer#1 :Parameter initialization and layer generation
filter_num, filter_size, pad, stride = list(conv_param_1.values())
pre_node_num = channel_num * (filter_size ** 2) #Number of connected nodes in the previous layer for one node
key_w, key_b = 'W1', 'b1' #Key when storing the dictionary
self.params[key_w] = np.random.normal(
scale=np.sqrt(2.0 / pre_node_num), #Standard deviation of the initial value of He
size=(filter_num, channel_num, filter_size, filter_size)
)
self.params[key_b] = np.zeros(filter_num)
self.layers['Conv1'] = Convolution(
self.params[key_w], self.params[key_b], stride, pad
)
#Input size calculation for the next layer
channel_num = filter_num
input_size = conv_output_size(input_size, filter_size, pad, stride)
# [2]ReLU layer#1 :Layer generation
self.layers['ReLU1'] = ReLU()
# [3]Convolution layer#2 :Parameter initialization and layer generation
filter_num, filter_size, pad, stride = list(conv_param_2.values())
pre_node_num = channel_num * (filter_size ** 2) #Number of connected nodes in the previous layer for one node
key_w, key_b = 'W2', 'b2' #Key when storing the dictionary
self.params[key_w] = np.random.normal(
scale=np.sqrt(2.0 / pre_node_num), #Standard deviation of the initial value of He
size=(filter_num, channel_num, filter_size, filter_size)
)
self.params[key_b] = np.zeros(filter_num)
self.layers['Conv2'] = Convolution(
self.params[key_w], self.params[key_b], stride, pad
)
#Input size calculation for the next layer
channel_num = filter_num
input_size = conv_output_size(input_size, filter_size, pad, stride)
# [4]ReLU layer#2 :Layer generation
self.layers['ReLU2'] = ReLU()
# [5]Pooling layer#1 :Layer generation
self.layers['Pool1'] = Pooling(pool_h=2, pool_w=2, stride=2)
#Input size calculation for the next layer
input_size = pool_output_size(input_size, pool_size=2, stride=2)
# [6]Convolution layer#3 :Parameter initialization and layer generation
filter_num, filter_size, pad, stride = list(conv_param_3.values())
pre_node_num = channel_num * (filter_size ** 2) #Number of connected nodes in the previous layer for one node
key_w, key_b = 'W3', 'b3' #Key when storing the dictionary
self.params[key_w] = np.random.normal(
scale=np.sqrt(2.0 / pre_node_num), #Standard deviation of the initial value of He
size=(filter_num, channel_num, filter_size, filter_size)
)
self.params[key_b] = np.zeros(filter_num)
self.layers['Conv3'] = Convolution(
self.params[key_w], self.params[key_b], stride, pad
)
#Input size calculation for the next layer
channel_num = filter_num
input_size = conv_output_size(input_size, filter_size, pad, stride)
# [7]ReLU layer#3 :Layer generation
self.layers['ReLU3'] = ReLU()
# [8]Convolution layer#4 :Parameter initialization and layer generation
filter_num, filter_size, pad, stride = list(conv_param_4.values())
pre_node_num = channel_num * (filter_size ** 2) #Number of connected nodes in the previous layer for one node
key_w, key_b = 'W4', 'b4' #Key when storing the dictionary
self.params[key_w] = np.random.normal(
scale=np.sqrt(2.0 / pre_node_num), #Standard deviation of the initial value of He
size=(filter_num, channel_num, filter_size, filter_size)
)
self.params[key_b] = np.zeros(filter_num)
self.layers['Conv4'] = Convolution(
self.params[key_w], self.params[key_b], stride, pad
)
#Input size calculation for the next layer
channel_num = filter_num
input_size = conv_output_size(input_size, filter_size, pad, stride)
# [9]ReLU layer#4 :Layer generation
self.layers['ReLU4'] = ReLU()
# [10]Pooling layer#2 :Layer generation
self.layers['Pool2'] = Pooling(pool_h=2, pool_w=2, stride=2)
#Input size calculation for the next layer
input_size = pool_output_size(input_size, pool_size=2, stride=2)
# [11]Convolution layer#5 :Parameter initialization and layer generation
filter_num, filter_size, pad, stride = list(conv_param_5.values())
pre_node_num = channel_num * (filter_size ** 2) #Number of connected nodes in the previous layer for one node
key_w, key_b = 'W5', 'b5' #Key when storing the dictionary
self.params[key_w] = np.random.normal(
scale=np.sqrt(2.0 / pre_node_num), #Standard deviation of the initial value of He
size=(filter_num, channel_num, filter_size, filter_size)
)
self.params[key_b] = np.zeros(filter_num)
self.layers['Conv5'] = Convolution(
self.params[key_w], self.params[key_b], stride, pad
)
#Input size calculation for the next layer
channel_num = filter_num
input_size = conv_output_size(input_size, filter_size, pad, stride)
# [12]ReLU layer#5 :Layer generation
self.layers['ReLU5'] = ReLU()
# [13]Convolution layer#6 :Parameter initialization and layer generation
filter_num, filter_size, pad, stride = list(conv_param_6.values())
pre_node_num = channel_num * (filter_size ** 2) #Number of connected nodes in the previous layer for one node
key_w, key_b = 'W6', 'b6' #Key when storing the dictionary
self.params[key_w] = np.random.normal(
scale=np.sqrt(2.0 / pre_node_num), #Standard deviation of the initial value of He
size=(filter_num, channel_num, filter_size, filter_size)
)
self.params[key_b] = np.zeros(filter_num)
self.layers['Conv6'] = Convolution(
self.params[key_w], self.params[key_b], stride, pad
)
#Input size calculation for the next layer
channel_num = filter_num
input_size = conv_output_size(input_size, filter_size, pad, stride)
# [14]ReLU layer#6 :Layer generation
self.layers['ReLU6'] = ReLU()
# [15]Pooling layer#3 :Layer generation
self.layers['Pool3'] = Pooling(pool_h=2, pool_w=2, stride=2)
#Input size calculation for the next layer
input_size = pool_output_size(input_size, pool_size=2, stride=2)
# [16]Affine layer#1 :Parameter initialization and layer generation
pre_node_num = channel_num * (input_size ** 2) #Number of connected nodes in the previous layer for one node
key_w, key_b = 'W7', 'b7' #Key when storing the dictionary
self.params[key_w] = np.random.normal(
scale=np.sqrt(2.0 / pre_node_num), #Standard deviation of the initial value of He
size=(channel_num * (input_size ** 2), hidden_size)
)
self.params[key_b] = np.zeros(hidden_size)
self.layers['Affine1'] = Affine(self.params[key_w], self.params[key_b])
#Input size calculation for the next layer
input_size = hidden_size
# [17]ReLU layer#7 :Layer generation
self.layers['ReLU7'] = ReLU()
# [18]Dropout layer#1 :Layer generation
self.layers['Drop1'] = Dropout(dropout_ratio=0.5)
# [19]Affine layer#2 :Parameter initialization and layer generation
pre_node_num = input_size #Number of connected nodes in the previous layer for one node
key_w, key_b = 'W8', 'b8' #Key when storing the dictionary
self.params[key_w] = np.random.normal(
scale=np.sqrt(2.0 / pre_node_num), #Standard deviation of the initial value of He
size=(input_size, output_size)
)
self.params[key_b] = np.zeros(output_size)
self.layers['Affine2'] = Affine(self.params[key_w], self.params[key_b])
# [20]Dropout layer#2 :Layer generation
self.layers['Drop2'] = Dropout(dropout_ratio=0.5)
# [21]Softmax layer:Layer generation
self.lastLayer = SoftmaxWithLoss()
def predict(self, x, train_flg=False):
"""Inference by neural network
Args:
x (numpy.ndarray):Input to neural network
train_flg (Boolean):True if learning (Erase neurons in Dropout layer)
Returns:
numpy.ndarray:Neural network output
"""
#Propagate layers forward
for layer in self.layers.values():
if isinstance(layer, Dropout):
x = layer.forward(x, train_flg) #In the case of the Dropout layer, tell if you are learning
else:
x = layer.forward(x)
return x
def loss(self, x, t):
"""Loss function value calculation
Args:
x (numpy.ndarray):Input to neural network
t (numpy.ndarray):Correct label
Returns:
float:Loss function value
"""
#inference
y = self.predict(x, True) #Loss is always true as it is only calculated during learning
# Softmax-with-Calculated by forward propagation of Loss layer
loss = self.lastLayer.forward(y, t)
return loss
def accuracy(self, x, t, batch_size=100):
"""Recognition accuracy calculation
batch_size is the batch size at the time of calculation. When trying to calculate a large amount of data at once
Because im2col eats too much memory and thrashing occurs and it does not work
To avoid that.
Args:
x (numpy.ndarray):Input to neural network
t (numpy.ndarray):Correct label (one-hot)
batch_size (int), optional):Batch size at the time of calculation, default is 100.
Returns:
float:Recognition accuracy
"""
#Calculation of the number of divisions
batch_num = max(int(x.shape[0] / batch_size), 1)
#Split
x_list = np.array_split(x, batch_num, 0)
t_list = np.array_split(t, batch_num, 0)
#Process in divided units
correct_num = 0 #Total number of correct answers
for (sub_x, sub_t) in zip(x_list, t_list):
assert sub_x.shape[0] == sub_t.shape[0], 'Did the division boundary shift?'
y = self.predict(sub_x, False) #Recognition accuracy is not calculated during learning, so it is always False
y = np.argmax(y, axis=1)
t = np.argmax(sub_t, axis=1)
correct_num += np.sum(y == t)
#Calculation of recognition accuracy
return correct_num / x.shape[0]
def gradient(self, x, t):
"""Gradient for weight parameters calculated by error backpropagation
Args:
x (numpy.ndarray):Input to neural network
t (numpy.ndarray):Correct label
Returns:
dictionary:A dictionary containing gradients
"""
#Forward propagation
self.loss(x, t) #Propagate forward to calculate loss value
#Backpropagation
dout = self.lastLayer.backward()
for layer in reversed(list(self.layers.values())):
dout = layer.backward(dout)
#Extract the differential value of each layer
grads = {}
layer = self.layers['Conv1']
grads['W1'], grads['b1'] = layer.dW, layer.db
layer = self.layers['Conv2']
grads['W2'], grads['b2'] = layer.dW, layer.db
layer = self.layers['Conv3']
grads['W3'], grads['b3'] = layer.dW, layer.db
layer = self.layers['Conv4']
grads['W4'], grads['b4'] = layer.dW, layer.db
layer = self.layers['Conv5']
grads['W5'], grads['b5'] = layer.dW, layer.db
layer = self.layers['Conv6']
grads['W6'], grads['b6'] = layer.dW, layer.db
layer = self.layers['Affine1']
grads['W7'], grads['b7'] = layer.dW, layer.db
layer = self.layers['Affine2']
grads['W8'], grads['b8'] = layer.dW, layer.db
return grads
Learning is almost the same as the code in the previous chapter. I was thinking of implementing the Trainer
class according to the code in the book, but since it is the last chapter and this implementation is over, I keep it as it is.
I tried to update the number of updates to 12,000
(20 epochs).
mnist.py
# coding: utf-8
import os
import sys
import matplotlib.pylab as plt
import numpy as np
from adam import Adam
from deep_conv_net import DeepConvNet
sys.path.append(os.pardir) #Add parent directory to path
from dataset.mnist import load_mnist
#Read MNIST training data and test data
(x_train, t_train), (x_test, t_test) = \
load_mnist(normalize=True, flatten=False, one_hot_label=True)
#Hyperparameter settings
iters_num = 12000 #Number of updates
batch_size = 100 #Batch size
adam_param_alpha = 0.001 #Adam parameters
adam_param_beta1 = 0.9 #Adam parameters
adam_param_beta2 = 0.999 #Adam parameters
train_size = x_train.shape[0] #Training data size
iter_per_epoch = max(int(train_size / batch_size), 1) #Number of iterations per epoch
#Deep convolutional neural network generation
network = DeepConvNet()
#Optimizer generation, using Adam
optimizer = Adam(adam_param_alpha, adam_param_beta1, adam_param_beta2)
#Confirmation of recognition accuracy before learning
train_acc = network.accuracy(x_train, t_train)
test_acc = network.accuracy(x_test, t_test)
train_loss_list = [] #Storage location of the transition of the value of the loss function
train_acc_list = [train_acc] #Storage location of changes in recognition accuracy for training data
test_acc_list = [test_acc] #Storage destination of transition of recognition accuracy for test data
print(f'Before learning[Training data recognition accuracy]{train_acc:.4f} [Test data recognition accuracy]{test_acc:.4f}')
#Start learning
for i in range(iters_num):
#Mini batch generation
batch_mask = np.random.choice(train_size, batch_size, replace=False)
x_batch = x_train[batch_mask]
t_batch = t_train[batch_mask]
#Gradient calculation
grads = network.gradient(x_batch, t_batch)
#Weight parameter update
optimizer.update(network.params, grads)
#Loss function value calculation
loss = network.loss(x_batch, t_batch)
train_loss_list.append(loss)
#Recognition accuracy calculation for each epoch
if (i + 1) % iter_per_epoch == 0:
train_acc = network.accuracy(x_train, t_train)
test_acc = network.accuracy(x_test, t_test)
train_acc_list.append(train_acc)
test_acc_list.append(test_acc)
#Progress display
print(
f'[epoch]{(i + 1) // iter_per_epoch:>2} '
f'[Number of updates]{i + 1:>5} [Loss function value]{loss:.4f} '
f'[Training data recognition accuracy]{train_acc:.4f} [Test data recognition accuracy]{test_acc:.4f}'
)
#Draw the transition of the value of the loss function
x = np.arange(len(train_loss_list))
plt.plot(x, train_loss_list, label='loss')
plt.xlabel('iteration')
plt.ylabel('loss')
plt.xlim(left=0)
plt.ylim(0, 2.5)
plt.show()
#Draw the transition of recognition accuracy of training data and test data
x2 = np.arange(len(train_acc_list))
plt.plot(x2, train_acc_list, label='train acc')
plt.plot(x2, test_acc_list, label='test acc', linestyle='--')
plt.xlabel('epochs')
plt.ylabel('accuracy')
plt.xlim(left=0)
plt.ylim(0, 1.0)
plt.legend(loc='lower right')
plt.show()
Below are the execution results. It took about half a day in my environment.
Before learning[Training data recognition accuracy]0.0975 [Test data recognition accuracy]0.0974
[epoch] 1 [Number of updates] 600 [Loss function value]1.0798 [Training data recognition accuracy]0.9798 [Test data recognition accuracy]0.9811
[epoch] 2 [Number of updates] 1200 [Loss function value]0.8792 [Training data recognition accuracy]0.9881 [Test data recognition accuracy]0.9872
[epoch] 3 [Number of updates] 1800 [Loss function value]0.9032 [Training data recognition accuracy]0.9884 [Test data recognition accuracy]0.9890
[epoch] 4 [Number of updates] 2400 [Loss function value]0.8012 [Training data recognition accuracy]0.9914 [Test data recognition accuracy]0.9906
[epoch] 5 [Number of updates] 3000 [Loss function value]0.9475 [Training data recognition accuracy]0.9932 [Test data recognition accuracy]0.9907
[epoch] 6 [Number of updates] 3600 [Loss function value]0.8105 [Training data recognition accuracy]0.9939 [Test data recognition accuracy]0.9910
[epoch] 7 [Number of updates] 4200 [Loss function value]0.8369 [Training data recognition accuracy]0.9920 [Test data recognition accuracy]0.9915
[epoch] 8 [Number of updates] 4800 [Loss function value]0.8727 [Training data recognition accuracy]0.9954 [Test data recognition accuracy]0.9939
[epoch] 9 [Number of updates] 5400 [Loss function value]0.9640 [Training data recognition accuracy]0.9958 [Test data recognition accuracy]0.9935
[epoch]10 [Number of updates] 6000 [Loss function value]0.8375 [Training data recognition accuracy]0.9953 [Test data recognition accuracy]0.9925
[epoch]11 [Number of updates] 6600 [Loss function value]0.8500 [Training data recognition accuracy]0.9955 [Test data recognition accuracy]0.9915
[epoch]12 [Number of updates] 7200 [Loss function value]0.7959 [Training data recognition accuracy]0.9966 [Test data recognition accuracy]0.9932
[epoch]13 [Number of updates] 7800 [Loss function value]0.7778 [Training data recognition accuracy]0.9946 [Test data recognition accuracy]0.9919
[epoch]14 [Number of updates] 8400 [Loss function value]0.9212 [Training data recognition accuracy]0.9973 [Test data recognition accuracy]0.9929
[epoch]15 [Number of updates] 9000 [Loss function value]0.9046 [Training data recognition accuracy]0.9974 [Test data recognition accuracy]0.9934
[epoch]16 [Number of updates] 9600 [Loss function value]0.9806 [Training data recognition accuracy]0.9970 [Test data recognition accuracy]0.9924
[epoch]17 [Number of updates]10200 [Loss function value]0.7837 [Training data recognition accuracy]0.9975 [Test data recognition accuracy]0.9931
[epoch]18 [Number of updates]10800 [Loss function value]0.8948 [Training data recognition accuracy]0.9976 [Test data recognition accuracy]0.9928
[epoch]19 [Number of updates]11400 [Loss function value]0.7936 [Training data recognition accuracy]0.9980 [Test data recognition accuracy]0.9932
[epoch]20 [Number of updates]12000 [Loss function value]0.8072 [Training data recognition accuracy]0.9984 [Test data recognition accuracy]0.9939
The final recognition accuracy was 99.39%. The CNN in the previous chapter was 98.60%, which is 0.79 points up. The result made me feel the possibility of deepening the layer.
The value of the loss function for recognition accuracy is larger than the result of the previous chapter, but I think this is due to Dropout. The recognition accuracy uses all neurons, but half of the neurons (because we ran the Dropout rate at 0.5) were in the deleted state when calculating the loss function.
This is the end of the implementation in this book, but methods such as ensemble learning and Data Augmentation are introduced to further improve recognition accuracy. It also summarizes the benefits of deepening the layer.
Introducing the trends of deep learning. In each case, I understood that the CNN learned so far is the basis.
This is an explanation of speeding up. What was interesting was that in deep learning, half-precision floating-point is attracting attention because single-precision floating-point is too accurate. I've never heard of half-precision floating point types in the development languages I've used, but I found out that NumPy has a type called float16
.
It turns out that object detection, segmentation, image captioning, and other interesting things have already been achieved. However, the mechanism is still not fully understood at the level I have learned so far.
This is an introduction to the fields under study, such as image generation, autonomous driving, and reinforcement learning. I feel the possibility of deep learning.
I managed to finish the final implementation. I am relieved that the accuracy of the book can be obtained. I was also able to learn about the possibilities of deep learning.
That's all for this chapter. If you have any mistakes, I would be grateful if you could point them out.
(To other chapters of this memo: Chapter 1 / Chapter 2 / Chapter 3 / Chapter 4 / [Chapter 5](https://qiita. com / segavvy / items / 8707e4e65aa7fa357d8a) / Chapter 6 / Chapter 7 Chapter / Summary)
Recommended Posts