TensorFlow In practice, it is common to create using a library such as Tensorflow instead of creating from numpy.
How to use Tensorflow
Note that Tendorflow is written differently from numpy. -Import is the same as numpy. -Use tr.constant () to define constants. Define constant (number, data type, array shape) in that order. In a, only the number 1 is defined, in b, the number 2 is defined as a float type 3 × 2 array, and in c, the numbers are serial numbers (0 to 4) and defined as a float type 2 × 2 array. -Tensorflow uses Tensor, and in order to operate Tensor, it is necessary to define tr.Session (). Therefore, in print () before defining tr and Session (), Tensor is not working and it is displayed as Tensor. After defining tr.Session (), you can see that it works with sess.run (a).
Next, check the placeholder. A placeholder is an image that defines something like a box in which you can put values later. Defined in tf.placeholder (). This time, it is defined as tf.placeholder ((dtype = tf.float32, shape = [None, 3]). Like the constants, the Tensor isn't working without a Session defined. Define a random 2x3 array with X = np.random.rand (2,3). Run Tensor with sess.run (). sess.run(x,feed_dect={x:X[0].reshape(1, -1)}) In feed_dict, assign X [0] to x, that is, the 0th array of X. .reshapr (1, -1) transforms the shape of the array into 1x3. In the definition of placeholder, shape = [None, 3] is defined as 〇 × 3. It is reshape (1, -1) and the part of 〇 is set to 1, and -1 is an array of 1 × 3 because it means that the defined 3 is used. When checking the result, confirm that the 0th array of X is assigned and it is a 1x3 array. The placeholder is used when substituting for each batch during learning.
Next, let's check variables. variables is an image of variables. Defined in tf.variables (). In the above, it is first defined as 1. The value of the variable changes as it is updated. The update formula is calc_op = x * a (variable x constant) Use tf.assign () to update the value. tf.assign (x, calc_op) is The process of updating x to the value of calc_op. When using a variable, it is necessary to initialize the variable (rewrite it to the initial value) first. tf.global_variables_initializer () is a method to initialize variables. In the first print (sess.run (x)), the initial value is 1, and in the second print (sess.run (x)), it is updated to 10 by update_x. In the third print (sess.run (x)), update_x updates the second x (10) to 100, which is the multiplication of the constant 10.
Check for linear regression. Regarding import, the same as above. In addition, matplotlib.pyplot is also implemented to display the graph. Iters_num sets the number of learnings, and plot_interval sets how many learnings each error is displayed. Next, data generation. x is created by the random number np.random.readn (n). The number is n, that is, it produces 100 random numbers. Substituting that number into the formula 3 * x + 2 to generate d. Since this d exists only on the straight line of 3 * x + 2, it is intentionally moved up and down from the straight line by adding noise. Learning is performed with x as input data and d as output data. So far, it's not much different from the method of generating with numpy.
Placeholder is used for input data and output data. The weight W and bias b are defined by variable, and the learning formula is obtained by learning with y = W * xt + b. The error is calculated by the average squared error. Calculate the square with tf.square () and the mean with tf.reduce_mean (). Define the learning rate with tf.tf.train.GradientDescentOptimizer () and The minimum error is calculated by optimizer.minimize (loss). Now that the preparation for learning is complete, the variables are initialized and the Session is started.
Learning is carried out using for sentences. Since iter_num = 300, learning is performed 300 times. This is a for statement that calls train with sess.run, assigns x_train to xt and d_train to dt, and plots at plot_interval. If you check the results, you can see that the values are close to W = 3.07, d = 1.94 and d = 3 * x + 2, and the prediction is correct.
[try] ① Change the value of noise. Implemented by changing the noise value of the above exercise to 0.6. The above is the changed part. Looking at the results, the increase in the noise value has widened the vertical variation. As a result, it was confirmed that the prediction result was less accurate than when the noise value was 0.3. ② Change the value of d. (1) Change W to 6. The above is the changed part. result (2) Change d to 4 The above is the changed part. result It was confirmed that both the results of (1) and (2) made predictions close to the formula of d. Since the forms of y and d are the same and the noise value, which is an error, is not changed, It is thought that the result was close to that of the exercise. From (1) and (2), it is necessary to be aware of the noise value of the data in order to make a good prediction.
Importing and data generation are similar to linear regression. As non-linear, the formula of d is different from that at the time of linear regression. Since the number of weight parameters is 4 this time from the d formula, W = tf.Variable(tf.random_normal([4, 1], stddev=0.01)) Defines four numbers. stddev = 0.01 represents the standard deviation 0.01 and defines a random initial value of the standard deviation 0.01. Since xt has four values as x3, 2nd, 1st, and 0th, 4 placeholders and d have 1 solution, so 1 placeholder is prepared. The formula is y = tf.matmul (xt, W) and y = tf.matmul () is the method of multiplication. The learning flow is to substitute x_train and d_train for xt and dt as in linear regression. It can be seen that the results are w1 = -0.4, w2 = 1.59, W3 = -2.80, w4 = 0.99, which are close to the d equation. [try] ① Change the noise value. Change the noise value to 0.5, which is 10 times the exercise The above is the changed part. result As with linear regression, the increase in noise value increases the vertical variation. ② Change the value of d Change W to an appropriate value from the time of the exercise. The above is the changed part. result From the results, it was confirmed that even if the value of W was changed as in the case of linear regression, it was correctly predicted that the noise value was small.
Practice problem The result can be predicted by the program created above, but since x = np.random.rand (n) when the data is generated, only a part of the points are plotted because the points are only in the range of 0 to 1. It is a graph.
mnist About the compensation part x = tf.placeholder(tf.float32, [None, 784]) 784 is because the dataset is 28 × 28 = 784. d = tf.placeholder(tf.float32, [None, 10]) The classification is for 10 classifications from 0 to 9. W = tf.Variable(tf.random_normal([784, 10], stddev=0.01)) Since W gives 10 values from the value of 784, it has a shape of 784 x 10. Initial value is standard deviation 0.01 b = tf.Variable(tf.zeros([10])) For 10 outputs. y = tf.nn.softmax(tf.matmul(x, W) + b) Substitute the value obtained by adding b to the multiplication of x and W to the softmax function. The error is calculated using cross entropy for classification. A batch of mnist train data is assigned by x_batch and d_batch in the for statement. It can be confirmed that the accuracy increases to about 87% after 100 learnings of mnist. In the above, check the 0th data of d_batdh, the size of x_batdh, and the image.
MNist is implemented in 3 layers. The purpose is to improve accuracy compared to the first layer.
Since it has a three-layer structure, two hidden layers (hidden_layer_size_1 = 600, hidden_layer_size_2 = 300) are prepared. As a result, a neural network is created that outputs 10 from 784 inputs through 1 600 hidden layers and 2 300 hidden layers. Since it has a three-layer structure including a hidden layer, it is necessary to prepare three weights W and three biases b. Since the number of data changes as the layer progresses, it is necessary for W to change its shape with a shape according to the swell. As a result, it can be confirmed that the accuracy is about 90%, which is higher than that of the first layer.
[try] ① Resize hidden layer Changed to hidden_layer_size_1 = 400, hidden_layer_size_2 = 150. The above is the changed part. result Since the number of hidden layers has decreased from the time of the exercise, it was confirmed that the accuracy has decreased. Since the number of hidden layers and the calculation time (accuracy) are in a trade-off relationship, adjustment is required each time. ② Change optimizer. The following figure shows the accuracy after changing to the following 4 types of optimizers and learning 3000 times. With an accuracy of 3000 learnings, the RMS Prop Optimizer gave the best results. When making comparisons in practice, learn until the accuracy reaches a plateau, and then make comparisons.
Classification using CNN
The flow of CNN is conv - relu - pool - conv - relu - pool - Perform in the order of affin --relu --dropout --affin --softmax. The definition of import and learning count is the same as above. Assign MNIST data to placeholder as x. Change the X to the image format of 28 × 28 × 1 channel and substitute it for x_image. W_conb1's 5x5x1x32 weight 5x5 is the CNN filter size. 1x32 has the meaning of expanding 1 channel to 32 channels. The calculation is done by h_conv1 = tf.nn.relu (tf.nn.conv2d (x_image, W_conv1, strides = [1, 1, 1, 1], padding ='SAME') + b_conv1). In tf.nn.conv2d (), x_image is set to convolution operation with W_conv1, stride is 1 and padding is SAME (same). After that, the bias b_conv1 is added and assigned to the ReLU function. The output, With h_pool1 = tf.nn.max_pool (h_conv1, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding ='SAME') With max_pool, ksize and stride 2x2 size, padding is calculated in the same way. The same calculation is performed for the second and third layers. After that, CNN calculation can be performed by performing affin --relu --dropout --affin --softmax operation. It was confirmed that the accuracy increased as the number of learnings increased and the accuracy was about 90% or more. [try] ① Change the dropout rate to 0. The above is the changed part. result From the result, it was about 95%, which was almost the same as the exercise.
Linear regression The result is linear regression like Tensorflow. The code is simpler in Keras. Features of Keras Unlike Tensorflow, W (weight) and b (bias) do not have to use placeholders or Variables. Input x and learning expression d are defined in the same way as Tensorflow. Next, prepare the norm Sequential (). This is the basis of the linear regression model. It will create a Dense (fully connected layer) network. This creates a one-layer network. model.summary () is the part that displays the model, and the summary is displayed like the second exercise capture above. If you confirm this, you can confirm that the number of output_Shape1 parameters is 2 in the 1-layer Dence fully connected network. Models can be learned simply by defining them with model.compile (). In Tensorflow, it was necessary to define ioss and optimizer. This time model.compile (loss ='mse', optimizer ='sgd') loss uses the squared error, and optimizer uses the hidden gradient descent method.
Simple perceptron The part where the learning code is different when compared with the above linear regression. The for statement was used in the linear regression, but this time, learning is performed using model.fit (). In model.fit (), the number of learnings can be defined by epochs. model.fit (X, T, epochs = 30, model.fit (X, T, epochs = 30, batch_size = 1)) has 30 learning times, batch_size = 1, and 4 data, so 30x4 Train 120 batches. When you run the code, a summary is displayed, and you can see that the accuracy increases as the number of learnings increases. [try] ① Change np.random.seed (0) to np.random.seed (1) The above is the changed part. result Confirm that the result is different from the above exercise because the initial value changes by changing from np.random.seed (0) to 1.
② Change the number of epochs to 100 The above is the changed part. result Since the number of batch trainings increased from 120 exercises to 400 exercises of 100 × 4, the error was small and converged.
③ Change to AND circuit and XOR circuit [1] AND circuit The above is the changed part. result It can be seen that the AND circuit is also well learned.
[2] XOR circuit The above is the changed part.
result The XOR circuit (exclusive OR) can only be represented linearly in a fully coupled network of layers. It was confirmed that an error occurs because the circuit cannot be represented linearly (0 and 1 cannot be separated by a straight line).
④ Change the batch size to 10 with an OR circuit The above is the changed part. result By increasing the batch size to 10, the number of learnings decreases, so it can be seen that the calculation time during the exercise is faster than 2ms. By increasing the batch size in this way, the number of learnings in one epoch can be reduced. Generally, batch size is a multiple of 2. Further, when the number of data is small, the batch size is small, and when the number of data is large, the batch size is increased.
⑤ Let's change the number of epochs to 300 The above is the changed part. result This is the same as (2), and since the number of batch trainings increases, the error is small and converges.
Classification of iris train_test_split classifies data into training data and verification data. Since the verification data is set to 20% by train_test_split (x, d, test_size = 0.2), the training data and the verification data are separated at a ratio of 4: 1. The model is defined as above. A two-layer neural network with an input layer size of 4 and an intermediate layer size of 12 through a ReLU function and a matrix size 3 softmax function. loss is sparse_categorical_crossentropy This is because the target is divided into 0,1,2. In the form of one_hot, categorical_entropy etc. are used. When the results are confirmed, it can be confirmed that the accuracy is improved by completing the learning, and that the accuracy of the graph is the same for both the learning data and the verification data.
[try] ① Change the activation function of the middle layer to sigmoid The above is the changed part. result The result is more accurate than the ReLU function. This is because the iris data this time is simple. Gradient disappearance occurs in the sigmoid function when the data becomes complicated.
② Import SGD and change optimizer to SGD (lr = 0.1) The above is the changed part. result It can be seen that the learning speed and accuracy reached 1.0 at an early stage by setting SGD (learning rate higher than the exercise). By using Keras, it is possible to easily implement and change the activation function, learning rate, and optimizer.
MNIST classification Import mnist data from data.mnist. The model is a three-layer neural network with inputs 784, intermediate layers 512 and 512, and outputs 10. Relu function is executed in the middle layer, and softmax function is executed in the output. It can be seen from the execution results that learning is performed using 60,000 images in one learning.
[try] ① Change one_hot_label of load_mnist to False The above is the changed part. result The result is an error display. Since categorical_crossentropy is used at one_hot, the shape error of the model is terrible.
② Change the error function to sparse_categorical_crossentropy Change the above loss from ①. result It was confirmed that when one_hot = False was changed to sparse_categorical_crossentropy, the operation started normally and the accuracy increased to about 98% at the 20th time.
③ Change the value of Adam's argument The above is the changed part. result The higher the learning rate, the faster the calculation speed, but the experience was not so different. The learning rate of 0.001 seems to be more appropriate. Another advantage of Keras is that you can immediately change the learning rate and compare.
CNN classification of MNIST I actually operated it with epoch = 20, but it took too long, so I changed it to epoch = 5. Since it is a CNN, reshape it to 28x28x1 and define the input_shape. With Keras, you can convolution just by using Conv2D and MaxPooling2D. model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape = input_shape)) 32 represents the number of output channels, and kernel_size = (3, 3) represents the filter size (3 × 3). The number of epochs is 5, but the precision of the numbers has increased to about 98%.
Learning with the cifar10 dataset cifar10 is a set of 32x32 pixel color image data of 10 kinds of labels "airplane, car, bird, cat, deer, dog, frog, horse, boat, truck". The number of data is 60,000, the number of training data: 50,000, the number of test data: 10000 Since each image data is written in RGB from 0 to 255, normalization (divided by 255 and changed to the range of 0 to 1) is performed. The neural network is a network using Conv2D and MaxPooling2D like the above CNN. The results were overfitting with a learning accuracy of 99% and a test accuracy of 65%. After all it is difficult to obtain good accuracy with epoch number 3.
RNN SimpleRNN, LSTM, GRU, etc., which were explained in all lectures, are also available in Keras. This time, RNN is used to add binary numbers. Numbers are added in binary notation for calculation by RNN. This is because the 2 new numbers are taken as the past figures are carried up. For the model, input_shape = [8, 2] of model.add () is 8 bytes × a and b data The time series represents 8 elements. The accuracy is 94% for the first time and 100% for the second and subsequent times. [try] (1) Change the number of RNN output nodes to 128 The above is the changed part. result The output_shape fluctuates due to the change of the output node. The accuracy is 100%, which is smaller than that during the error exercise, and the accuracy is improved.
(2) Change the output activation function of RNN to sigmoid The above is the changed part. result It was confirmed that the accuracy of the result was lower than that of the Relu function.
③ Change the output activation function of RNN to tanh The above is the changed part. result The accuracy is better than that of the sigmoid function, and the accuracy result is almost the same as that of the ReLU function.
④ Change the optimization method to adam The above is the changed part. result As a result, it can be seen that the error is smaller than that of the SDG and the accuracy is improved.
⑤ RNN input Dropout is set to 0.5 The above is the changed part. result I was able to experience that the calculation speed was improved by dropping out. However, the calculation accuracy is quite poor.
⑥ Set RNN recursive Dropout to 0.3 The above is the changed part. result Although the generalization performance is improved, the result is almost the same as (5).
⑦ Set RNN unroll to True Unroll refers to deploying a network. When expanded, it tends to concentrate on memory, but on the other hand, the calculation speed improves. The above is the changed part. result As explained above, it was confirmed that the calculation speed was improved. The accuracy is slightly higher than ⑤.
How to implement GRU network You can implement a GRU network simply by rewriting SimpleRNN in model.add to GRU.
Reinforcement learning: A field of machine learning that aims to create agents who choose actions in the environment so that rewards can be maximized in the long run. A mechanism for determining and improving actions based on the rewards given from the results of actions.
Confirmation test (4-19) Consider examples that could be applied to reinforcement learning, and list the environment, agents, and rewards in detail.
Example: Autonomous driving of a car Environment: On the road, Agent: Driver, Reward: Accelerator, brake, steering wheel operation, Reward: Safe driving
Trade-off between exploration and utilization If you have the data (knowledge) that you can win in this scene with Othello or Go, you can predict and decide the action, There is no data on how to turn the steering wheel to drive safely as in the case of driving a car in the specific example above. Therefore, in reinforcement learning, the data is assumed to be incomplete, and the data is collected while acting, and the optimum action is taken from there.
Therefore, if you act only with past data, you cannot search and if you collect only data, you cannot use past data, which is a trade-off relationship. Reinforcement learning requires proper coordination of search and utilization.
Difference between reinforcement learning and supervised learning In supervised learning, the goal is to find and predict data patterns. Reinforcement learning aims to find better rewarding behaviors. The goals are different.
Behavioral value function There are two types, the state value function and the action value function. It is the state value function that focuses on the value of a certain state, and the behavioral value function that focuses on the value that combines state and value.
Policy function It is a function that gives the probability of what kind of action to take in a certain state in policy-based reinforcement learning.
Policy gradient method Techniques for modeling and optimizing strategies There are average remuneration and discount remuneration sum as definition methods.
Recommended Posts