Deep learning

table of contents [Deep Learning: Day1 NN] (https://qiita.com/matsukura04583/items/6317c57bc21de646da8e) [Deep Learning: Day2 CNN] (https://qiita.com/matsukura04583/items/29f0dcc3ddeca4bf69a2) [Deep Learning: Day3 RNN] (https://qiita.com/matsukura04583/items/9b77a238da4441e0f973) [Deep Learning: Day4 Reinforcement Learning / TensorFlow] (https://qiita.com/matsukura04583/items/50806b750c8d77f2305d)

Deep Learning: Day4 Reinforcement Learning / TensorFlow (Lecture Summary)

TensorFlow provided by Google. Most often used as a raybrari. Therefore, in the lecture, we will focus on TensorFlow.

Section1) TensorFlow implementation exercise

Linear regression (DN65)

Number of data 300
d = 3x + 2 (noise 0.3)
Error function Mean squared error
Optimizer gradient descent
Learning rate 0.1

[try]

Let's change the value of noise Let's change the number of d

$ \ Rightarrow $ [Discussion]

TensorFlow doesn't work without an explicit session.
[(Reference) Examining how reshape deforms with -1 in TensorFlow- "-1 means" calculate from other elements and make it an appropriate number "](https:: //teratail.com/questions/104809)
(Reference: What I investigated) About Optimizer
TensorFlow currently supports the following:

Optimizer name	Description
GradientDescentOptimizer	Gradient descent optimizer
AdagradOptimizer	AdaGrad method optimizer
MomentumOptimizer	Momentum optimizer
AdamOptimize	Adam method
FtrlOptimizer	Follow the Regularized Leader algorithm(I haven't learned this)
RMSPropOptimizer	Algorithm that automates the adjustment of learning rate

(Reference) Optimizer for tensorflow

Nonlinear Regression (DN66)

Number of data 100
$ d = --0.4 x ^ 3 + 1.6x ^ 2 --2.8x + 1 $ (noise 0.05)
At this time, note that W (weight) is -0.4,1,6,-2,8,1 (x to the 0th power) and b (bias) is not used.
Error function Mean squared error
Optimizer gradient descent
Learning rate 0.01

[try] Let's change the value of noise Let's change the number of d

W = tf.Variable(tf.random_normal([4, 1], stddev=0.01)) stddev = 0.01 is defined in tf.random_normal as a random initial value with a standard deviation of 0.01.

Exercise (DN67)

[try]

Let's perform regression using the following equation as a model
$ d = 30x ^ 2 + 0.5x + 0.2 $ (noise 0.05)
Adjust iters_num and learning_rate so that the error converges

$ \ Rightarrow $ [Discussion] The result is that adjusting learning_rate is more effective than adjusting iters_num (iteration number: number of iterative learnings). [Change source]

`python`


import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

#Change the iteration here
iters_num = 10000
plot_interval = 100

#Generate data
n=100
#random.rand(): 0.0 or more, 1.Random number generation less than 0
x = np.random.rand(n).astype(np.float32) * 4 - 2
d =  30 * x ** 2 +0.5 * x + 0.2

#Add noise
noise = 0.05
d = d + noise * np.random.randn(n) 

#model
#Note that we are not using b.
#Added: The number of Ws has changed from 4 to 3, so change
#xt = tf.placeholder(tf.float32, [None, 4])
xt = tf.placeholder(tf.float32, [None, 3])
dt = tf.placeholder(tf.float32, [None, 1])
#Added: The number of Ws has changed from 4 to 3, so change
#W = tf.Variable(tf.random_normal([4, 1], stddev=0.01))
W = tf.Variable(tf.random_normal([3, 1], stddev=0.01))
y = tf.matmul(xt,W)

#Error function Mean squared error
loss = tf.reduce_mean(tf.square(y - dt))
#Change the learning rate here
optimizer = tf.train.AdamOptimizer(0.001)
train = optimizer.minimize(loss)

#Initialization
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

#Prepare the created data as training data
d_train = d.reshape(-1,1)
#x_train = np.zeros([n, 4])
x_train = np.zeros([n, 3])
for i in range(n):
#Added: The number of Ws has changed from 4 to 3, so change
#    for j in range(4):
    for j in range(3):
        x_train[i, j] = x[i]**j

#training
for i in range(iters_num):
    if (i+1) % plot_interval == 0:
        loss_val = sess.run(loss, feed_dict={xt:x_train, dt:d_train}) 
        W_val = sess.run(W)
        print('Generation: ' + str(i+1) + '.error= ' + str(loss_val))
    sess.run(train, feed_dict={xt:x_train,dt:d_train})

print(W_val[::-1])
    
#Prediction function
def predict(x):
    result = 0.
#Added: The number of Ws has changed from 4 to 3, so change
#   for i in range(0,4):
    for i in range(0,3):
        result += W_val[i,0] * x ** i
    return result

fig = plt.figure()
subplot = fig.add_subplot(1,1,1)
plt.scatter(x ,d)
linex = np.linspace(-2,2,100)
liney = predict(linex)
subplot.plot(linex,liney)
plt.show()

MNIST1(DN68)

MNIST is a handwritten data set that contains 11 channels of 28✖️28✖️ black and white images. It's like predicting which number from +0 to 9.

Classification 3 layers (mnist) (DN69)

[try] Let's resize the hidden layer Let's change the optimizer $ \ Rightarrow $ [Discussion] スクリーンショット 2020-01-04 19.33.05.png When the size of the hidden layer was halved, the correct answer rate dropped significantly. On the other hand, when the optimizer was changed from Adam to Momentum, the correct answer rate increased from 90 to 94%. Others were done, but RMS Drop was the best at 96%. I tried doubling the size of the hidden layer, but the improvement in the correct answer rate was about 1%, so if the size of the hidden layer was deep enough, I felt that it would be desirable to adjust it with the optimizer after that.

Classification CNN (mnist) (DN70)

conv - relu - pool - conv - relu - pool - affin - relu - dropout - affin - softmax [try]

Let's change the dropout rate to 0 $ \ Rightarrow $ [Discussion] (Before change) dropout_rate = 0.5 スクリーンショット 2020-01-04 19.54.24.png

(After change) dropout_rate = 0 スクリーンショット 2020-01-04 20.02.13.png I thought it would go down more, but it didn't change much.

Example explanation

$ \ Rightarrow $ [Discussion] The answer is (a)

googlenet is a network composed of a stack of Inception modules.
Inceptin module Define a small network in one module as shown in the table above. Therefore, (D) is the correct answer. The Inceptin module is usually characterized by defining the filter size and performing the part in Convolution by using multiple filter sizes. C, which reduces dimensions by 1x1 convolution, is the correct answer. B, which improves expressiveness while reducing the number of parameters by multiple convolutions, is also correct.

Answer (a) is incorrect First of all, regarding loss, the feature of this loss is that the classification is performed in the part branched from the network in the middle.

The explanation of the following examples is omitted.

[DN73] Confirmation test in the explanation of the example Briefly describe the features of VGG, GoogleNet, and ResNet.

For VGG, it will be the oldest 2014 model. As a feature, it is simple by stacking simple networks such as Convolution, Convolution, max_pool. On the other hand, it is characterized by a large number of parameters compared to the other two. The feature of Google Net is that it uses the inception module. It is characterized by dimension reduction using 1✖️1 size and sparseness by using various filter sizes. The feature of ResNet is that it can perform deep learning by making a residual connection by using the <Skip Connection Identity Module.

Keras2 (DN69)

Simple perceptron

OR circuit [try] Changed np.random.seed (0) to np.random.seed (1) Changed the number of epochs to 100 Changed to AND circuit and XOR circuit Change batch size to 10 with OR circuit Let's change the number of epochs to 300 ⇒ [Discussion] (Before change) np.random.seed (0) スクリーンショット 2020-01-04 22.40.41.png (After change) Changed to np.random.seed (1) スクリーンショット 2020-01-04 22.49.46.png (After change) Epoch changed from 30 times to 100 times スクリーンショット 2020-01-04 22.58.35.png (After change) Change to AND circuit OR and AND are linearly separable, but XOR is not linearly separable and cannot be learned. (After change) Change batch size to 10 with OR circuit スクリーンショット 2020-01-04 23.22.23.png (After change) Let's change the number of epochs to 300 スクリーンショット 2020-01-04 23.24.53.png

Classification (iris)

[try]

Let's change the activation function of the middle layer to sigmoid
Import SGD and change optimizer to SGD (lr = 0.1)

(Before change / ReLU) スクリーンショット 2020-01-04 23.38.52.png (Changed activation function to Sygmoid) スクリーンショット 2020-01-05 0.40.44.png After all, it can be said that ReRU is more accurate from the graph. (Changed optimization to optimizer = SGD (lr = 0.1)) スクリーンショット 2020-01-05 0.53.14.png

With optimizer = SGD (lr = 0.1), there are some areas where the accuracy has improved so that 1.0 appears occasionally, but it seems that there are also many variations.

Classification (mnist)

[try]

Change one_hot_label of load_mnist to False (error)
Change the error function to sparse_categorical_crossentropy
Change the value of Adam's argument

(Change before) スクリーンショット 2020-01-05 1.17.45.png (After change) change one_hot_label to False スクリーンショット 2020-01-05 4.51.25.png

(After change) Change error function to sparse_categorical_crossentropy And change one_hot_label to False スクリーンショット 2020-01-05 5.04.45.png

categorical_crossentropy → set one_hot_label to True sparse_categorical_crossentropy → Fales one_hot_label Must be. If not, an error will occur.

(After change) Let's change the value of Adam's lr argument (learning rate 0.01-> 0.1) スクリーンショット 2020-01-05 5.10.06.png

RNN (Prediction of binary addition) Keras RNN documentation

Changed the number of RNN output nodes to 128
Changed RNN output activation function to sigmoid
Changed RNN output activation function to tanh
Changed optimization method to adam
RNN input Dropout set to 0.5
Set RNN recursive Dropout to 0.3
Set RNN unroll to True

[try] (Change before) スクリーンショット 2020-01-07 15.54.14.png (After change) Change the number of output nodes to 128 Changed SimpleRNN units = 16 $ \ Rightarrow $ units = 128. スクリーンショット 2020-01-07 16.07.28.png It has risen from the stage of EPOCH1 to Acc 0.9299. (After change) Changed output activation function to ReLU $ \ Rightarrow $ sigmoid スクリーンショット 2020-01-07 16.21.14.png The result of Sygmoid is that Acc does not rise as much as LeRU. (After change) Change output activation function to tanh スクリーンショット 2020-01-07 16.33.54.png It takes up to Epoch3 even though Acc is up to 100%.

(After change) Optimized method changed to adam Source change

`pyton`


#model.compile(loss='mean_squared_error', optimizer=SGD(lr=0.1), metrics=['accuracy'])
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])

Acc is almost a good result.

(After change) Input Dropout set to 0.5 スクリーンショット 2020-01-07 16.41.19.png The result that Acc does not rise as much as expected.

(After change) Set recursive Dropout to 0.3 スクリーンショット 2020-01-07 16.53.33.png This is also only Acc 98%.

(After change) set unroll to True スクリーンショット 2020-01-07 17.15.36.png This is also a good result.

Section2) Reinforcement learning

2-1 What is reinforcement learning?

A field of machine learning that aims to create agents who can choose actions in the environment so that rewards can be maximized in the long run. $ \ Rightarrow $ It is a mechanism to improve the principle of deciding an action based on the profit (reward) given as a result of the action.

[D81] Reinforcement learning 1 Confirmation test Consider examples that could be applied to reinforcement learning, and list environmental agents, actions, and rewards.

⇒ [Discussion] Stock investment robot Environment ⇒ Stock market Agent ⇒ Investor Action ⇒ Select and invest in stocks that are likely to be profitable Remuneration ⇒ Profit / loss from buying and selling stocks

2-2 Application example of reinforcement learning

For marketing Environment: Company Sales Promotion Department Agent: Send campaign emails based on profile and purchase history It is software that determines the customer to send. Action: You will have to choose between two actions, send and non-send, for each customer. Reward: Negative reward of campaign cost and campaign Receive a positive reward of sales that are estimated to be made

2-3 Trade-off between search and use

With perfect knowledge of the environment in advance, it is possible to predict and determine optimal behavior.
⇒Situations where it is known what kind of customer the campaign email will be sent to and what kind of action will be taken.
⇒ In the case of reinforcement learning, the above assumption does not hold. Collect data while acting on the basis of incomplete knowledge. Find the best action.

With historical data, if you always take only the best behavior, you cannot find another best behavior. ⇒ Insufficient search (The relationship between the top and bottom is a trade-off) If you keep taking only unknown actions, you cannot make use of your past experience. Insufficient use Trade-off relationship ⇒ Only unknown altitude

2-4 Image of reinforcement learning

2-5 Differences in reinforcement learning

Differences between reinforcement learning and supervised and unsupervised learning

Conclusion: different goals

Unsupervised and with learning, the goal is to find patterns in the data and predict from that data
Reinforcement learning aims to find good strategies

History of reinforcement learning About reinforcement learning ・ Although there was a winter era, reinforcement learning is becoming possible when there is a large-scale state due to the progress of calculation speed. ・ Appearance of a method that combines function approximation and Q-learning

Q learning ・ How to proceed with learning by updating the action value function each time you act Function approximation method ・ A method of function approximation of value functions and policy functions.

2-6 Action value function

What is the action value function?

There are two types of value functions: state value function and action value function.
State value function when focusing on the value of a state Action value function when focusing on the value of a combination of state and value

2-7 Policy function

A policy function is a function that gives the probability of what action to take in a certain state in a policy-based reinforcement learning method.

2-8 Policy Gradient Method

Policy Iterative Method Techniques for modeling and optimizing strategies ⇒ Policy gradient method

\theta^{(t+1)}=\theta^{(t)}\epsilon\nabla j(\theta)

What is j? ⇒ Good policy ... Must be defined

Definition method ・ Average reward ・ Discount reward sum Corresponding to the above definition, the action value function: Q (s, a) is defined. The policy gradient theorem holds.

\nabla _{\theta} j(\theta)=E_{\pi_\theta} [\nabla_{\theta} log\pi_\theta(a|s)Q^\pi(s,a))]

<Course> Deep Learning Day4 Reinforcement Learning / Tensor Flow

Deep learning

Deep Learning: Day4 Reinforcement Learning / TensorFlow (Lecture Summary)

Section1) TensorFlow implementation exercise

Linear regression (DN65)

Nonlinear Regression (DN66)

Exercise (DN67)

python

Classification 3 layers (mnist) (DN69)

Classification CNN (mnist) (DN70)

Example explanation

Simple perceptron

Classification (iris)

Classification (mnist)

pyton

Section2) Reinforcement learning

2-1 What is reinforcement learning?

2-2 Application example of reinforcement learning

2-3 Trade-off between search and use

2-4 Image of reinforcement learning

2-5 Differences in reinforcement learning

2-6 Action value function

2-7 Policy function

2-8 Policy Gradient Method

`python`

`pyton`