table of contents [Deep Learning: Day1 NN] (https://qiita.com/matsukura04583/items/6317c57bc21de646da8e) [Deep Learning: Day2 CNN] (https://qiita.com/matsukura04583/items/29f0dcc3ddeca4bf69a2) [Deep Learning: Day3 RNN] (https://qiita.com/matsukura04583/items/9b77a238da4441e0f973) [Deep Learning: Day4 Reinforcement Learning / TensorFlow] (https://qiita.com/matsukura04583/items/50806b750c8d77f2305d)
[try]
Let's change the value of noise Let's change the number of d
$ \ Rightarrow $ [Discussion]
Optimizer name | Description |
---|---|
GradientDescentOptimizer | Gradient descent optimizer |
AdagradOptimizer | AdaGrad method optimizer |
MomentumOptimizer | Momentum optimizer |
AdamOptimize | Adam method |
FtrlOptimizer | Follow the Regularized Leader algorithm(I haven't learned this) |
RMSPropOptimizer | Algorithm that automates the adjustment of learning rate |
(Reference) Optimizer for tensorflow
[try] Let's change the value of noise Let's change the number of d
[try]
python
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
#Change the iteration here
iters_num = 10000
plot_interval = 100
#Generate data
n=100
#random.rand(): 0.0 or more, 1.Random number generation less than 0
x = np.random.rand(n).astype(np.float32) * 4 - 2
d = 30 * x ** 2 +0.5 * x + 0.2
#Add noise
noise = 0.05
d = d + noise * np.random.randn(n)
#model
#Note that we are not using b.
#Added: The number of Ws has changed from 4 to 3, so change
#xt = tf.placeholder(tf.float32, [None, 4])
xt = tf.placeholder(tf.float32, [None, 3])
dt = tf.placeholder(tf.float32, [None, 1])
#Added: The number of Ws has changed from 4 to 3, so change
#W = tf.Variable(tf.random_normal([4, 1], stddev=0.01))
W = tf.Variable(tf.random_normal([3, 1], stddev=0.01))
y = tf.matmul(xt,W)
#Error function Mean squared error
loss = tf.reduce_mean(tf.square(y - dt))
#Change the learning rate here
optimizer = tf.train.AdamOptimizer(0.001)
train = optimizer.minimize(loss)
#Initialization
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
#Prepare the created data as training data
d_train = d.reshape(-1,1)
#x_train = np.zeros([n, 4])
x_train = np.zeros([n, 3])
for i in range(n):
#Added: The number of Ws has changed from 4 to 3, so change
# for j in range(4):
for j in range(3):
x_train[i, j] = x[i]**j
#training
for i in range(iters_num):
if (i+1) % plot_interval == 0:
loss_val = sess.run(loss, feed_dict={xt:x_train, dt:d_train})
W_val = sess.run(W)
print('Generation: ' + str(i+1) + '.error= ' + str(loss_val))
sess.run(train, feed_dict={xt:x_train,dt:d_train})
print(W_val[::-1])
#Prediction function
def predict(x):
result = 0.
#Added: The number of Ws has changed from 4 to 3, so change
# for i in range(0,4):
for i in range(0,3):
result += W_val[i,0] * x ** i
return result
fig = plt.figure()
subplot = fig.add_subplot(1,1,1)
plt.scatter(x ,d)
linex = np.linspace(-2,2,100)
liney = predict(linex)
subplot.plot(linex,liney)
plt.show()
MNIST1(DN68)
[try] Let's resize the hidden layer Let's change the optimizer $ \ Rightarrow $ [Discussion] When the size of the hidden layer was halved, the correct answer rate dropped significantly. On the other hand, when the optimizer was changed from Adam to Momentum, the correct answer rate increased from 90 to 94%. Others were done, but RMS Drop was the best at 96%. I tried doubling the size of the hidden layer, but the improvement in the correct answer rate was about 1%, so if the size of the hidden layer was deep enough, I felt that it would be desirable to adjust it with the optimizer after that.
conv - relu - pool - conv - relu - pool - affin - relu - dropout - affin - softmax [try]
Let's change the dropout rate to 0 $ \ Rightarrow $ [Discussion] (Before change) dropout_rate = 0.5
(After change) dropout_rate = 0 I thought it would go down more, but it didn't change much.
$ \ Rightarrow $ [Discussion] The answer is (a)
Answer (a) is incorrect First of all, regarding loss, the feature of this loss is that the classification is performed in the part branched from the network in the middle.
The explanation of the following examples is omitted.
[DN73] Confirmation test in the explanation of the example Briefly describe the features of VGG, GoogleNet, and ResNet.
For VGG, it will be the oldest 2014 model. As a feature, it is simple by stacking simple networks such as Convolution, Convolution, max_pool. On the other hand, it is characterized by a large number of parameters compared to the other two. The feature of Google Net is that it uses the inception module. It is characterized by dimension reduction using 1✖️1 size and sparseness by using various filter sizes. The feature of ResNet is that it can perform deep learning by making a residual connection by using the <Skip Connection Identity Module.
Keras2 (DN69)
OR circuit [try] Changed np.random.seed (0) to np.random.seed (1) Changed the number of epochs to 100 Changed to AND circuit and XOR circuit Change batch size to 10 with OR circuit Let's change the number of epochs to 300 ⇒ [Discussion] (Before change) np.random.seed (0) (After change) Changed to np.random.seed (1) (After change) Epoch changed from 30 times to 100 times (After change) Change to AND circuit OR and AND are linearly separable, but XOR is not linearly separable and cannot be learned. (After change) Change batch size to 10 with OR circuit (After change) Let's change the number of epochs to 300
[try]
(Before change / ReLU) (Changed activation function to Sygmoid) After all, it can be said that ReRU is more accurate from the graph. (Changed optimization to optimizer = SGD (lr = 0.1))
With optimizer = SGD (lr = 0.1), there are some areas where the accuracy has improved so that 1.0 appears occasionally, but it seems that there are also many variations.
[try]
(Change before) (After change) change one_hot_label to False
(After change) Change error function to sparse_categorical_crossentropy And change one_hot_label to False
categorical_crossentropy → set one_hot_label to True sparse_categorical_crossentropy → Fales one_hot_label Must be. If not, an error will occur.
(After change) Let's change the value of Adam's lr argument (learning rate 0.01-> 0.1)
RNN (Prediction of binary addition) Keras RNN documentation
[try] (Change before) (After change) Change the number of output nodes to 128 Changed SimpleRNN units = 16 $ \ Rightarrow $ units = 128. It has risen from the stage of EPOCH1 to Acc 0.9299. (After change) Changed output activation function to ReLU $ \ Rightarrow $ sigmoid The result of Sygmoid is that Acc does not rise as much as LeRU. (After change) Change output activation function to tanh It takes up to Epoch3 even though Acc is up to 100%.
(After change) Optimized method changed to adam Source change
pyton
#model.compile(loss='mean_squared_error', optimizer=SGD(lr=0.1), metrics=['accuracy'])
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
Acc is almost a good result.
(After change) Input Dropout set to 0.5 The result that Acc does not rise as much as expected.
(After change) Set recursive Dropout to 0.3 This is also only Acc 98%.
(After change) set unroll to True This is also a good result.
A field of machine learning that aims to create agents who can choose actions in the environment so that rewards can be maximized in the long run. $ \ Rightarrow $ It is a mechanism to improve the principle of deciding an action based on the profit (reward) given as a result of the action.
[D81] Reinforcement learning 1 Confirmation test Consider examples that could be applied to reinforcement learning, and list environmental agents, actions, and rewards.
⇒ [Discussion] Stock investment robot Environment ⇒ Stock market Agent ⇒ Investor Action ⇒ Select and invest in stocks that are likely to be profitable Remuneration ⇒ Profit / loss from buying and selling stocks
For marketing Environment: Company Sales Promotion Department Agent: Send campaign emails based on profile and purchase history It is software that determines the customer to send. Action: You will have to choose between two actions, send and non-send, for each customer. Reward: Negative reward of campaign cost and campaign Receive a positive reward of sales that are estimated to be made
With perfect knowledge of the environment in advance, it is possible to predict and determine optimal behavior.
⇒Situations where it is known what kind of customer the campaign email will be sent to and what kind of action will be taken.
⇒ In the case of reinforcement learning, the above assumption does not hold. Collect data while acting on the basis of incomplete knowledge. Find the best action.
With historical data, if you always take only the best behavior, you cannot find another best behavior. ⇒ Insufficient search (The relationship between the top and bottom is a trade-off) If you keep taking only unknown actions, you cannot make use of your past experience. Insufficient use Trade-off relationship ⇒ Only unknown altitude
Differences between reinforcement learning and supervised and unsupervised learning
Conclusion: different goals
History of reinforcement learning About reinforcement learning ・ Although there was a winter era, reinforcement learning is becoming possible when there is a large-scale state due to the progress of calculation speed. ・ Appearance of a method that combines function approximation and Q-learning
Q learning ・ How to proceed with learning by updating the action value function each time you act Function approximation method ・ A method of function approximation of value functions and policy functions.
What is the action value function?
A policy function is a function that gives the probability of what action to take in a certain state in a policy-based reinforcement learning method.
Policy Iterative Method Techniques for modeling and optimizing strategies ⇒ Policy gradient method
\theta^{(t+1)}=\theta^{(t)}\epsilon\nabla j(\theta)
What is j? ⇒ Good policy ... Must be defined
Definition method ・ Average reward ・ Discount reward sum Corresponding to the above definition, the action value function: Q (s, a) is defined. The policy gradient theorem holds.
\nabla _{\theta} j(\theta)=E_{\pi_\theta} [\nabla_{\theta} log\pi_\theta(a|s)Q^\pi(s,a))]
Recommended Posts