[Deep Learning: Day1 NN] (https://qiita.com/matsukura04583/items/6317c57bc21de646da8e) [Deep Learning: Day2 CNN] (https://qiita.com/matsukura04583/items/29f0dcc3ddeca4bf69a2) [Deep Learning: Day3 RNN] (https://qiita.com/matsukura04583/items/9b77a238da4441e0f973) [Deep Learning: Day4 Reinforcement Learning / TensorFlow] (https://qiita.com/matsukura04583/items/50806b750c8d77f2305d)
What you can do with neural networks (NN)
Regression
Expected results
Stock price forecast
Sales forecast
Ranking
Horse racing ranking forecast
Popularity ranking forecast
Classification
Identification of cat photos
Handwriting recognition
Flower type classification
Neural network ︓ Regression (approximation of a function that takes continuous real values)
[Regression analysis] • Linear regression • Regression tree • Random forest • Neural network (NN)
Neural network︓ Classification (gender (male or female) and animal type Analysis for predicting discrete results)
[Classification analysis] • Bayesian classification • Logistic regression • Random forest • Neural network (NN)
Formula
f(x) = \left\{
\begin{array}{ll}
1 & (x \geq 0) \\
0 & (x \lt 0)
\end{array}
\right.
python
def
step_function(x):
if x > 0:
return 1
else:
return 0
Formula
f(u) = \frac{1}{1+e^{-u}}
python
def sigmoid(x):
return 1/(1 + np.exp(-x))
It is a function that changes slowly between 0 and 1, and it has become possible to convey the strength of the signal to the state where the step function has only ON / OFF, which has triggered the spread of predictive neural networks. Task At large values, the change in output is small, which can cause a vanishing gradient problem.
f(x) = \left\{
\begin{array}{ll}
x & (x \gt 0) \\
0 & (x \leq 0)
\end{array}
\right.
python
def relu(x):
return
np.maximum(0, x)
The most used activation function now Good results have been achieved by contributing to avoiding the vanishing gradient problem and sparsification.
Error calculation Error function = Square error
En(w)=\frac{1}{2}\sum_{j=1}^{I} (y_j-d_j)^2 = \frac{1}{2}||(y-d)||^2
En(w)=-\sum_{i=1}^Id_ilog y_i
python
#Cross entropy
def cross_entropy_error(d, y):
if y.ndim == 1:
d = d.reshape(1, d.size)
y = y.reshape(1, y.size)
#Teacher data is one-hot-In case of vector, convert to index of correct label
if d.size == y.size:
d = d.argmax(axis=1)
batch_size = y.shape[0]
return -np.sum(np.log(y[np.arange(batch_size), d] + 1e-7)) / batch_size
A one-hot vector is a vector such as (0,1,0,0,0,0) where one component is 1 and the remaining components are all 0. (Reference) Sites examined by What is One-hot vector
(Reference) Gradient descent method commentary site
Stochastic gradient descent
$ W ^ {(t + 1)} = W ^ {(t)}-\ varepsilon \ nabla E (\ varepsilon is the learning rate) $ ・ ・ ・ Gradient descent method
Gradient descent method is the average of all samples Error
+
$ W ^ {(t + 1)} = W ^ {(t)}-\ varepsilon \ nabla En (\ varepsilon is the learning rate) $ ・ ・ ・ Stochastic gradient descent
Stochastic gradient descent is the error of randomly sampled samples
Advantages of stochastic gradient descent
Reduction of calculation cost when data is long
Reduce the risk of converging on unwanted local minimal solutions
You can study online
[(Reference) Stochastic Gradient Descent Method Explanation Site](https://qiita.com/YudaiSadakuni/items/ece07b04c685a64eaa01#%E7%A2%BA%E7%8E%87%E7%9A%84%E5% 8B% BE% E9% 85% 8D% E9% 99% 8D% E4% B8% 8B% E6% B3% 95)
Mini-batch gradient descent
+
$ W ^ {(t + 1)} = W ^ {(t)}-\ varepsilon \ nabla En (\ varepsilon is the learning rate) $ ・ ・ ・ Stochastic gradient descent
Stochastic gradient descent is the error of randomly sampled samples
$ W ^ {(t + 1)} = W ^ {(t)}-\ varepsilon \ nabla Et (\ varepsilon is the learning rate) $ ・ ・ ・ Mini batch gradient descent method
The mini-batch gradient descent method is a set of randomly extracted data () mini-batch) average error of samples belonging to $ D_t $
Advantages of mini-batch gradient descent Effective use of computer resources without compromising the advantages of stochastic gradient descent → Thread parallelization using CPU and SIMD parallelization using GPU
Error Gradient Calculation-Error Backpropagation Method [Error back propagation method] The calculated error is differentiated in order from the output layer side and propagated to the layer before the previous layer. A method of analytically calculating the differential value of each parameter with minimal calculation By back-calculating the derivative from the calculation result (= error), the derivative can be calculated while avoiding unnecessary recursive calculations.
python
#Error back propagation
def backward(x, d, z1, y):
print("\n#####Error back propagation start#####")
grad = {}
W1, W2 = network['W1'], network['W2']
b1, b2 = network['b1'], network['b2']
#Delta at the output layer
delta2 = functions.d_sigmoid_with_loss(d, y)
#Gradient of b2
grad['b2'] = np.sum(delta2, axis=0)
#Gradient of W2
grad['W2'] = np.dot(z1.T, delta2)
#Delta in the middle layer
delta1 = np.dot(delta2, W2.T) * functions.d_relu(z1)
#Gradient of b1
grad['b1'] = np.sum(delta1, axis=0)
#Gradient of W1
grad['W1'] = np.dot(x.T, delta1)
print_vec("Partial differential_dE/du2", delta2)
print_vec("Partial differential_dE/du2", delta1)
print_vec("Partial differential_Weight 1", grad["W1"])
print_vec("Partial differential_Weight 2", grad["W2"])
print_vec("Partial differential_Bias 1", grad["b1"])
print_vec("Partial differential_Bias 2", grad["b2"])
return grad
[P10] In deep learning, describe what you are trying to do in two lines or less. Also, which of the following values is the ultimate goal of optimization? Choose all. ① Input value [X] ② Output value [Y] ③ Weight [W] ④ Bias [b] ⑤ Total input [u] ⑥ Intermediate layer input [z] ⑦ Learning rate [ρ]
⇒ [Discussion] After all, deep learning aims to determine the parameters that minimize the error. The ultimate goal of optimizing the values is (3) weight [W] and (4) bias [b].
[P12] Put the following network on paper.
⇒ [Discussion] It's easy to understand if you write it yourself.
[P19] Confirmation test Let's put an example of animal classification in this diagram![P19.gif](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/357717/6a0b680d-9466-598d- 67e9-d9156a754193.gif)
⇒ [Discussion]
[P21] Confirmation test
Write this expression in python
u=w_1x_1+w_2x_2+w_3x_3+w_4x_4+b=Wx+b..(1.2)
⇒ [Discussion]
pyhon
u1=np.dot(x,W1)+b1
[P23] Confirmation test Extract the code that represents the middle layer
⇒ [Discussion]
pyhon
#Total input of hidden layers
u1 = np.dot(x, W1) + b1
#Total output of hidden layer
z1 = functions.relu(u1)
[P26] Confirmation test Explain the difference between linear and non-linear with a diagram.
[P34] Confirmation test Fully coupled NN-single layer, multiple nodes Extract the relevant part from the distributed source code. ⇒ [Discussion] Since the activation function f (u) is a sigmoid function, this is the part.
python
z1 = functions.sigmoid(u)
[P34] Confirmation test Error calculation Error function = Square error
En(w)=\frac{1}{2}\sum_{j=1}^{I} (y_j-d_j)^2 = \frac{1}{2}||(y-d)||^2
・ Describe why you square instead of subtraction ・ Describe what half of the formula below means.
⇒ [Discussion] ・ To express the variance as a plus ・ 1/2 is the average value (Reference) The site here was easy to understand Meaning and calculation method of least squares method --How to find regression line
[P51] Confirmation test (S3_2 output layer_activation function) Softmax function
①f(i,u)=\frac{e^{u_i}②}{\sum_{k=1}^{k}e^{u_k}③}
Show the source code corresponding to the formulas (1) to (3) and explain line by line.
python
def softmax(x):
if x.ndim == 2:#If it was two-dimensional
x = x.Tx
x = x-np.max(x, axis=0)
y = np.exp(x) /np.sum(np.exp(x), axis=0)
return y.T
x = x -np.max(x) #Overflow measures
return np.exp(x) / np.sum(np.exp(x))
① ・ ・ ・ ・ y (returning transposition from return y.T) ② ・ ・ ・ ・ np.exp (x) part ③ ・ ・ ・ ・ np.sum (np.exp (x), axis = 0) part
(Learning reference) What does NumPy's axis and dimension mean?
[P53] Confirmation test(S3_2 Output layer_Activation function)
Cross entropy
①~Show the source code corresponding to the formula in 2 and explain the process line by line.
```math
En(w)=-\sum_{i=1}^Id_ilog y_i
⇒ [Discussion] ・ Return-np.sum(np.log(y[np.arange(batch_size), d] + 1e-7)) / batch_size ・ 1/2 is taking the average value
python
# Cross entropy
def cross_entropy_error(d, y):
if y.ndim == 1:
d = d.reshape(1, d.size)
y = y.reshape(1, y.size)
# If the teacher data is one-hot-vector, convert it to the index of the correct label
if d.size == y.size:
d = d.argmax(axis=1)
batch_size = y.shape[0]
return -np.sum(np.log(y[np.arange(batch_size), d] + 1e-7)) / batch_size
[P56] Confirmation test(S4 gradient descent method) Find the appropriate source code for the gradient descent function.
⇒ [Discussion]
python
# error
loss = functions.cross_entropy_error(d, y)
grad = backward (x, d, z1, y) # Corresponds to the part ②
for key in ('W1', 'W2', 'b1', 'b2'):
network [key]-= learning_rate * grad [key] # ①
[P65] Confirmation test(S4 gradient descent method) Summarize what online learning is. ⇒ [Discussion] Online learning means that a learning model can be created using only newly acquired data. It can be turned without utilizing existing data.
[P69] Confirmation test(S4 gradient descent method)
Explain the meaning of this formula in a diagram.
+
⇒ [Discussion]
(〇〇〇) (〇〇〇) (〇〇〇)
Set 1 Set 2 Set 3
In this case, add up the errors with any one dataset as a set of mini-batch, 1/3
[P78] Confirmation test(S5 error back propagation method)
The error back propagation method can avoid unnecessary recursive processing. Extract the source code that holds the calculation results that have already been performed.
python
# Error back propagation
def backward(x, d, z1, y):
print ("\ n ##### Error back propagation start #####")
grad = {}
W1, W2 = network['W1'], network['W2']
b1, b2 = network['b1'], network['b2']
#Delta in the output layer ## Here, the derivative of the function that combines the sigmoid function and the cross entropy is calculated and assigned to "delta2".
delta2 = functions.d_sigmoid_with_loss(d, y)
Gradient of # b2 ## Using "delta2"
grad['b2'] = np.sum(delta2, axis=0)
# W2 gradient ## Using "delta2"
grad['W2'] = np.dot(z1.T, delta2)
# Delta in the middle layer ## Using "delta2"
delta1 = np.dot(delta2, W2.T) * functions.d_relu(z1)
Gradient of # b1
grad['b1'] = np.sum(delta1, axis=0)
# W1 gradient
grad['W1'] = np.dot(x.T, delta1)
print_vec ("Partial derivative_dE / du2", delta2)
print_vec ("Partial derivative_dE / du2", delta1)
print_vec ("Partial derivative_weight 1", grad ["W1"])
print_vec ("Partial derivative_weight 2", grad ["W2"])
print_vec ("Partial derivative_bias 1", grad ["b1"])
print_vec ("Partial derivative_bias 2", grad ["b2"])
return grad
[P83] Find the source code that corresponds to the two blanks.(S5 error back propagation method)
python
# Delta at the output layer
delta2 = functions.d_mean_squared_error(d, y)
python
# Delta at the output layer
# W2 gradient
grad['W2'] = np.dot(z1.T, delta2)
#Exercise
python
# Let's try #
# Forward propagation (single layer / single unit)
# weight
W = np.array([[0.1], [0.2]])
## Let's try _ array initialization
W = np.zeros(2)
W = np.ones (2) # Select here
W = np.random.rand(2)
W = np.random.randint(5, size=(2))
print_vec ("weight", W)
# bias
b = 0.5
## Let's try _ Numerical initialization
b = np.random.rand () # Random number from 0 to 1 # Select this
# b = np.random.rand () * 10 -5 # -5 ~ 5 random numbers
print_vec ("bias", b)
# Input value
x = np.array([2, 3])
print_vec ("input", x)
# Total input
u = np.dot(x, W) + b
print_vec ("total input", u)
# Intermediate layer output
z = functions.relu(u)
print_vec ("intermediate layer output", z)
weight [1. 1.]
bias 0.15691869859919338
input [2 3]
Total input 5.156918698599194
Intermediate layer output 5.156918698599194
python
# Let's try #
# Forward propagation (single layer / multiple units)
# weight
W = np.array([
[0.1, 0.2, 0.3],
[0.2, 0.3, 0.4],
[0.3, 0.4, 0.5],
[0.4, 0.5, 0.6]
])
## Let's try _ array initialization
W = np.zeros((4,3))
W = np.ones ((4,3)) # Select here
W = np.random.rand(4,3)
W = np.random.randint(5, size=(4,3))
print_vec ("weight", W)
# bias
b = np.array([0.1, 0.2, 0.3])
print_vec ("bias", b)
# Input value
x = np.array([1.0, 5.0, 2.0, -1.0])
print_vec ("input", x)
# Total input
u = np.dot(x, W) + b
print_vec ("total input", u)
# Intermediate layer output
z = functions.sigmoid(u)
print_vec ("intermediate layer output", z)
weight [[1. 1. 1.] [1. 1. 1.] [1. 1. 1.] [1. 1. 1.]]
bias [0.1 0.2 0.3]
input [ 1. 5. 2. -1.]
Total input [7.1 7.2 7.3]
Intermediate layer output [0.99917558 0.99925397 0.99932492]
python
# Let's try #
# Multi-class classification
# 2-3-4 network
# !! Let's try _ Let's change the node configuration to 3-5-4
# Set weights and biases
# Create a work
def init_network():
print ("##### Network initialization #####")
#Let's try
#_ Display the shape of each parameter
#_ Network initial value random generation
network = {}
network['W1'] = np.array([
[0.1, 0.4, 0.7, 0.1, 0.3],
[0.2, 0.5, 0.8, 0.1, 0.4],
[0.3, 0.6, 0.9, 0.2, 0.5]
])
network['W2'] = np.array([
[0.1, 0.6, 0.1, 0.6],
[0.2, 0.7, 0.2, 0.7],
[0.3, 0.8, 0.3, 0.8],
[0.4, 0.9, 0.4, 0.9],
[0.5, 0.1, 0.5, 0.1]
])
network['b1'] = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
network['b2'] = np.array([0.1, 0.2, 0.3, 0.4])
print_vec ("weight 1", network ['W1'])
print_vec ("weight 2", network ['W2'])
print_vec ("bias 1", network ['b1'])
print_vec ("bias 2", network ['b2'])
return network
# Create a process
# x: Input value
def forward(network, x):
print ("##### Start propagation #####")
W1, W2 = network['W1'], network['W2']
b1, b2 = network['b1'], network['b2']
# 1 layer total input
u1 = np.dot(x, W1) + b1
# 1 layer total output
z1 = functions.relu(u1)
# 2 layers total input
u2 = np.dot(z1, W2) + b2
# Output value
y = functions.softmax(u2)
print_vec ("total input 1", u1)
print_vec ("intermediate layer output 1", z1)
print_vec ("total input 2", u2)
print_vec ("output 1", y)
print ("total output:" + str (np.sum (y)))
return y, z1
## Preliminary data
# Input value
x = np.array([1., 2., 3.])
# Target output
d = np.array([0, 0, 0, 1])
# Network initialization
network = init_network()
# output
y, z1 = forward(network, x)
# error
loss = functions.cross_entropy_error(d, y)
## display
print ("\ n ##### Result display #####")
print_vec ("output", y)
print_vec ("training data", d)
print_vec ("error", loss)
#####Network initialization##### Weight 1 [[0.1 0.4 0.7 0.1 0.3] [0.2 0.5 0.8 0.1 0.4] [0.3 0.6 0.9 0.2 0.5]]
Weight 2 [[0.1 0.6 0.1 0.6] [0.2 0.7 0.2 0.7] [0.3 0.8 0.3 0.8] [0.4 0.9 0.4 0.9] [0.5 0.1 0.5 0.1]]
Bias 1 [0.1 0.2 0.3 0.4 0.5]
Bias 2 [0.1 0.2 0.3 0.4]
#####Start forward propagation##### Total input 1 [1.5 3.4 5.3 1.3 3.1]
Intermediate layer output 1 [1.5 3.4 5.3 1.3 3.1]
Total input 2 [4.59 9.2 4.79 9.4 ]
Output 1 [0.00443583 0.44573018 0.00541793 0.54441607]
Output total: 1.0
#####Result display##### output [0.00443583 0.44573018 0.00541793 0.54441607]
Training data [0 0 0 1]
error 0.6080413107681358
python
# Let's try #
# Regression
# 2-3-2 Network
# !! Let's try _ Let's change the node configuration to 3-5-4
# Set weights and biases
# Create a work
def init_network():
print ("##### Network initialization #####")
network = {}
network['W1'] = np.array([
[0.1, 0.4, 0.7, 0.1, 0.3],
[0.2, 0.5, 0.8, 0.1, 0.4],
[0.3, 0.6, 0.9, 0.2, 0.5]
])
network['W2'] = np.array([
[0.1, 0.6, 0.1, 0.6],
[0.2, 0.7, 0.2, 0.7],
[0.3, 0.8, 0.3, 0.8],
[0.4, 0.9, 0.4, 0.9],
[0.5, 0.1, 0.5, 0.1]
])
network['b1'] = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
network['b2'] = np.array([0.1, 0.2, 0.3, 0.4])
print_vec ("weight 1", network ['W1'])
print_vec ("weight 2", network ['W2'])
print_vec ("bias 1", network ['b1'])
print_vec ("bias 2", network ['b2'])
return network
# Create a process
def forward(network, x):
print ("##### Start propagation #####")
W1, W2 = network['W1'], network['W2']
b1, b2 = network['b1'], network['b2']
# Total input of hidden layer
u1 = np.dot(x, W1) + b1
# Total output of hidden layer
z1 = functions.relu(u1)
# Total input of output layer
u2 = np.dot(z1, W2) + b2
# Total output of output layer
y = u2
print_vec ("total input 1", u1)
print_vec ("intermediate layer output 1", z1)
print_vec ("total input 2", u2)
print_vec ("output 1", y)
print ("total output:" + str (np.sum (z1)))
return y, z1
# Input value
x = np.array([1., 2., 3.])
network = init_network()
y, z1 = forward(network, x)
# Target output
d = np.array([2., 3.,4.,5.])
# error
loss = functions.mean_squared_error(d, y)
## display
print ("\ n ##### Result display #####")
print_vec ("intermediate layer output", z1)
print_vec ("output", y)
print_vec ("training data", d)
print_vec ("error", loss)
#####Network initialization##### Weight 1 [[0.1 0.4 0.7 0.1 0.3] [0.2 0.5 0.8 0.1 0.4] [0.3 0.6 0.9 0.2 0.5]]
Weight 2 [[0.1 0.6 0.1 0.6] [0.2 0.7 0.2 0.7] [0.3 0.8 0.3 0.8] [0.4 0.9 0.4 0.9] [0.5 0.1 0.5 0.1]]
Bias 1 [0.1 0.2 0.3 0.4 0.5]
Bias 2 [0.1 0.2 0.3 0.4]
#####Start forward propagation##### Total input 1 [1.5 3.4 5.3 1.3 3.1]
Intermediate layer output 1 [1.5 3.4 5.3 1.3 3.1]
Total input 2 [4.59 9.2 4.79 9.4 ]
Output 1 [4.59 9.2 4.79 9.4 ]
Output total: 14.6
#####Result display##### Intermediate layer output [1.5 3.4 5.3 1.3 3.1]
output [4.59 9.2 4.79 9.4 ]
Training data [2. 3. 4. 5.]
error 8.141525
python
# Let's try #
# Binary classification
# 2-3-1 Network
# !! Let's try _ Let's change the node configuration to 5-10-1
# Set weights and biases
# Create a work
def init_network():
print ("##### Network initialization #####")
network = {}
network['W1'] = np.array([
[0.1, 0.3, 0.5,0.1, 0.3, 0.5,0.1, 0.3, 0.5, 0.6],
[0.2, 0.4, 0.6,0.2, 0.4, 0.6,0.2, 0.4, 0.6,0.7],
[0.2, 0.4, 0.6,0.2, 0.4, 0.6,0.2, 0.4, 0.6,0.7],
[0.2, 0.4, 0.6,0.2, 0.4, 0.6,0.2, 0.4, 0.6,0.7],
[0.2, 0.4, 0.6,0.2, 0.4, 0.6,0.2, 0.4, 0.6,0.7]
])
network['W2'] = np.array([
[0.1],
[0.1],
[0.1],
[0.1],
[0.1],
[0.1],
[0.1],
[0.1],
[0.1],
[0.1]
])
network['b1'] = np.array([0.1, 0.3, 0.5,0.1, 0.3, 0.5,0.1, 0.3, 0.5, 0.6])
network['b2'] = np.array([0.1])
return network
# Create a process
def forward(network, x):
print ("##### Start propagation #####")
W1, W2 = network['W1'], network['W2']
b1, b2 = network['b1'], network['b2']
# Total input of hidden layer
u1 = np.dot(x, W1) + b1
# Total output of hidden layer
z1 = functions.relu(u1)
# Total input of output layer
u2 = np.dot(z1, W2) + b2
# Total output of output layer
y = functions.sigmoid(u2)
print_vec ("total input 1", u1)
print_vec ("intermediate layer output 1", z1)
print_vec ("total input 2", u2)
print_vec ("output 1", y)
print ("total output:" + str (np.sum (z1)))
return y, z1
# Input value
x = np.array([1., 2., 3., 4., 5.])
# Target output
d = np.array([1])
network = init_network()
y, z1 = forward(network, x)
# error
loss = functions.cross_entropy_error(d, y)
## display
print ("\ n ##### Result display #####")
print_vec ("intermediate layer output", z1)
print_vec ("output", y)
print_vec ("training data", d)
print_vec ("error", loss)
#####Network initialization##### #####Start forward propagation##### Total input 1 [ 3. 6.2 9.4 3. 6.2 9.4 3. 6.2 9.4 11. ]
Intermediate layer output 1 [ 3. 6.2 9.4 3. 6.2 9.4 3. 6.2 9.4 11. ]
Total input 2 [6.78]
Output 1 [0.99886501]
Output total: 66.8
#####Result display##### Intermediate layer output [ 3. 6.2 9.4 3. 6.2 9.4 3. 6.2 9.4 11. ]
output [0.99886501]
Training data [1]
error 0.0011355297129812408 ⇒ [Discussion] It was found that the output result changes greatly depending on the size of the intermediate layer. How do you decide the number of units? I found it difficult to decide on the middle class. The number of units in the input layer needs to match the dimensions of the data so you don't get lost. And you don't have to think about the number of units in the output layer as you prepare as many as the number of classified classes. I would like to find out if the approximation works if the number of middle layers is increased enormously. I also found it difficult to set weights and biases.
python
# Let's try #
# Sample function
# AI that predicts the value of y
def f(x):
y = 3 * x[0] + 2 * x[1]
return y
# Initial setting
def init_network():
# print ("##### Network initialization #####")
network = {}
nodesNum = 10
network['W1'] = np.random.randn(2, nodesNum)
network['W2'] = np.random.randn(nodesNum)
network['b1'] = np.random.randn(nodesNum)
network['b2'] = np.random.randn()
# print_vec ("weight 1", network ['W1'])
# print_vec ("weight 2", network ['W2'])
# print_vec ("bias 1", network ['b1'])
# print_vec ("bias 2", network ['b2'])
return network
# Forward propagation
def forward(network, x):
# print ("##### Sequential propagation start #####")
W1, W2 = network['W1'], network['W2']
b1, b2 = network['b1'], network['b2']
u1 = np.dot(x, W1) + b1
#z1 = functions.relu(u1)
## Let's try
z1 = functions.sigmoid (u1) # Select and try sigmoid
u2 = np.dot(z1, W2) + b2
y = u2
# print_vec ("total input 1", u1)
# print_vec ("Middle layer output 1", z1)
# print_vec ("total input 2", u2)
# print_vec ("output 1", y)
# print ("total output:" + str (np.sum (y)))
return z1, y
# Error back propagation
def backward(x, d, z1, y):
# print ("\ n ##### Error back propagation start #####")
grad = {}
W1, W2 = network['W1'], network['W2']
b1, b2 = network['b1'], network['b2']
#Delta in the output layer
delta2 = functions.d_mean_squared_error(d, y)
Gradient of # b2
grad['b2'] = np.sum(delta2, axis=0)
# W2 gradient
grad['W2'] = np.dot(z1.T, delta2)
# Delta in the middle layer
#delta1 = np.dot(delta2, W2.T) * functions.d_relu(z1)
## Let's try #Select sigmoid and try
delta1 = np.dot(delta2, W2.T) * functions.d_sigmoid(z1)
delta1 = delta1[np.newaxis, :]
Gradient of # b1
grad['b1'] = np.sum(delta1, axis=0)
x = x[np.newaxis, :]
# W1 gradient
grad['W1'] = np.dot(x.T, delta1)
# print_vec ("Partial derivative_weight 1", grad ["W1"])
# print_vec ("Partial derivative_weight 2", grad ["W2"])
# print_vec ("Partial derivative_bias 1", grad ["b1"])
# print_vec ("Partial derivative_bias 2", grad ["b2"])
return grad
# Create sample data
data_sets_size = 100000
data_sets = [0 for i in range(data_sets_size)]
for i in range(data_sets_size):
data_sets[i] = {}
# Set a random value
# data_sets[i]['x'] = np.random.rand(2)
## Let's try _ Input value setting # Select this to try
data_sets [i] ['x'] = np.random.rand (2) * 10 -5 # Random number from -5 to 5
# Set target output
data_sets[i]['d'] = f(data_sets[i]['x'])
losses = []
# Learning rate
learning_rate = 0.07
# Number of extracts
epoch = 1000
# Parameter initialization
network = init_network()
# Random extraction of data
random_datasets = np.random.choice(data_sets, epoch)
# Repeated gradient descent
for dataset in random_datasets:
x, d = dataset['x'], dataset['d']
z1, y = forward(network, x)
grad = backward(x, d, z1, y)
#Apply gradient to parameter
for key in ('W1', 'W2', 'b1', 'b2'):
network[key] -= learning_rate * grad[key]
#Error
loss = functions.mean_squared_error(d, y)
losses.append(loss)
print ("##### Result display #####")
lists = range(epoch)
plt.plot(lists, losses, '.')
# Graph display
plt.show()
⇒ [Discussion]
By changing from the Relu function to the sigmoid function, the variance closer to 0 in the graph has expanded. Also,-By selecting a random number from 5 to 5, +(Problem) Create Deep Learninng using IRIS data
###design Create a model that predicts by training IRIS data in a 2: 1 ratio between training data and test data.
python
import numpy as np
# Hyperparameters
INPUT_SIZE = 4 # number of input nodes
HIDDEN_SIZE = 6 # Number of neurons in the middle layer (hidden layer)
OUTPUT_SIZE = 3 # Number of neurons in the output layer
TRAIN_DATA_SIZE = 50 # Use TRAIN_DATA_SIZE as training data out of 150 data. The rest is used as teacher data.
LEARNING_RATE = 0.1 # Learning rate
EPOCH = 1000 # Number of repeated learnings (number of epochs)
# Read data
# Get the Iris dataset here. Since the data is sorted by type with headings, CSV data is prepared so that 150 data are mixed in 3 types and 10 cases each.
https://gist.github.com/netj/8836201
x = np.loadtxt('/content/drive/My Drive/DNN_code/data/iris.csv', delimiter=',',skiprows=1, usecols=(0, 1, 2, 3))
raw_t = np.loadtxt('/content/drive/My Drive/DNN_code/data/iris.csv', delimiter=',',skiprows=1,dtype="unicode", usecols=(4,))
t = np.zeros([150])
for i in range(0,150):
vari = raw_t[i]
#print(vari,raw_t[i],i)
if ("Setosa" in vari):
t[i] = int(0)
elif ("Versicolor" in vari):
t[i] = int(1)
elif ("Virginica" in vari):
t[i] = int(2)
else:
print ("error", i)
a = [3, 0, 8, 1, 9]
a = t.tolist()
a_int = [int(n) for n in a]
print(a_int)
a_one_hot = np.identity(10)[a_int]
a_one_hot = np.identity(len(np.unique(a)))[a_int]
print(a_one_hot)
train_x = x[:TRAIN_DATA_SIZE]
train_t = a_one_hot[:TRAIN_DATA_SIZE]
test_x = x[TRAIN_DATA_SIZE:]
test_t = a_one_hot[TRAIN_DATA_SIZE:]
print("train=",TRAIN_DATA_SIZE,train_x,train_t)
print("test=",test_x,test_t)
# Weight / bias initialization #He initial value (for using ReLU)
W1 = np.random.randn(INPUT_SIZE, HIDDEN_SIZE) / np.sqrt(INPUT_SIZE) * np.sqrt(2)
W2 = np.random.randn(HIDDEN_SIZE, OUTPUT_SIZE)/ np.sqrt(HIDDEN_SIZE) * np.sqrt(2)
# Adjust from initial value zero
b1 = np.zeros(HIDDEN_SIZE)
b2 = np.zeros(OUTPUT_SIZE)
# ReLU function
def relu(x):
return np.maximum(x, 0)
# Softmax function
def softmax(x):
if x.ndim == 2:
x = x.T
x = x - np.max(x, axis=0)
y = np.exp(x) / np.sum(np.exp(x), axis=0)
return y.T
x = x --np.max (x) # Overflow measures
return np.exp(x) / np.sum(np.exp(x))
# Cross entropy error
def cross_entropy_error(y, t):
if y.shape != t.shape:
raise ValueError
if y.ndim == 1:
return - (t * np.log(y)).sum()
elif y.ndim == 2:
return - (t * np.log(y)).sum() / y.shape[0]
else:
raise ValueError
# Forward propagation
def forward(x):
global W1, W2, b1, b2
return softmax(np.dot(relu(np.dot(x, W1) + b1), W2) + b2)
# Test data results
test_y = forward(test_x)
print ("Before learning =", (test_y.argmax (axis = 1) == test_t.argmax (axis = 1)). Sum (),'/', 150 --TRAIN_DATA_SIZE)
# Learning loop
for i in range(EPOCH):
# Forward propagation with data storage
y1 = np.dot(train_x, W1) + b1
y2 = relu(y1)
train_y = softmax(np.dot(y2, W2) + b2)
# Loss function calculation
L = cross_entropy_error(train_y, train_t)
if i% 100 == 0: remainder of # 100
print("L=",L)
# Gradient calculation
a1 = (train_y - train_t) / TRAIN_DATA_SIZE
b2_gradient = a1.sum(axis=0)
W2_gradient = np.dot(y2.T, a1)
a2 = np.dot(a1, W2.T)
a2[y1 <= 0.0] = 0
b1_gradient = a2.sum(axis=0)
W1_gradient = np.dot(train_x.T, a2)
#Parameter update
W1 = W1 - LEARNING_RATE * W1_gradient
W2 = W2 - LEARNING_RATE * W2_gradient
b1 = b1 - LEARNING_RATE * b1_gradient
b2 = b2 - LEARNING_RATE * b2_gradient
# Result display
# L value of final training data
L = cross_entropy_error(forward(train_x), train_t)
print ("L value of final training data =", L)
# Test data results
test_y = forward(test_x)
print ("After learning =", (test_y.argmax (axis = 1) == test_t.argmax (axis = 1)). sum (),'/', 150 --TRAIN_DATA_SIZE)
(result) Before learning = 42/ 100 L= 4.550956552060894 L= 0.3239415165787326 L= 0.2170679838829666 L= 0.04933110713361697 L= 0.0273865499319481 L= 0.018217122389043848 L= 0.013351028977015358 L= 0.010399165844496665 L= 0.008444934117102367 L= 0.007068429052588092 L value of final training data= 0.0060528995955394386 After learning = 89/ 100
⇒ [Discussion]
Recommended Posts