Translated tutorials for Beginners and actually used TensorFlow in previous and previous I did machine learning. This time I translated the tutorial for Expert.
Deep MNIST for Experts TensorFlow is a powerful library for performing large-scale numerical calculations. One of the best tasks is training and executing deep neural networks. In this tutorial we will learn the basic components of the TensorFlow model while building a deep convolutional MNIST classifier.
This introduction assumes that you are familiar with neural networks and MNIST datasets. If you can't have those backgrounds, check out the Introduction for Beginners (https://www.tensorflow.org/versions/master/tutorials/mnist/beginners/index.html). Be sure to install TensorFlow before you start.
Setup Before building our model, we first load the MNIST dataset and then start a TensorFlow session.
Load MNIST Data For your convenience, we will automatically download and import the MNIST dataset [script](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/examples/tutorials/mnist/input_data .py) is included. This creates a'MNIST_data' directory for storing data files.
import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
This mnist is a lightweight class that stores a set of training, validation, and testing as an array of NumPy. It also provides a function that iterates through a mini-batch of data, which we use below.
Start TensorFlow InteractiveSession TensorFlow relies on a highly effective C ++ backend for computing. The connection to this backend is called a session. The usual use of a TensorFlow program is to first create a graph and then start it in a session.
Here we instead use the convenient InteractiveSession class, which makes TensorFlow more flexible about how to build your code. This allows you to start the graph and plug in the process of building the Computation Graph (https://www.tensorflow.org/versions/master/get_started/basic_usage.html#the-computation-graph) To. This is especially useful when working in a two-way context like iPython. If you are not using InteractiveSession, you start a session and Start Graph (https://www.tensorflow.org/versions/master/get_started/basic_usage.html#launching-the-graph-in -a-session) You should build the whole computational graph before.
import tensorflow as tf
sess = tf.InteractiveSession()
For efficient numerical calculations in Python, we generally use very effective code implemented in other languages, like NumPy, which does high processing like matrix multiplication outside of Python. Library. Unfortunately, there will still be a lot of overhead in switching to all of Python's processing. This overhead is especially bad if you want to calculate on the GPU or in a distributed way that would be expensive to transfer data.
TensorFlow also does its hard work outside of Python, but takes further steps to avoid this overhead. Instead of running one expensive process that is independent of Python, TensorFlow describes a graph of interacting processes that runs completely outside Python. This approach is similar to using Theano or Torch.
The role of the Python code is therefore to build this external computational graph and to dictate which part of the computational graph to run. For more details, see Calculation Graph in Basic Usage. You can look at the /get_started/basic_usage.html#the-computation-graph) section.
In this section, we build a softmax regression model of one linear layer. In the next section, we extend this in the case of softmax regression of multi-layer convolutional networks.
Placeholders We start building computational graphs by creating nodes for image inputs and target output classes.
x = tf.placeholder("float", shape=[None, 784])
y_ = tf.placeholder("float", shape=[None, 10])
These x
and y_
are not special values.
Rather, they are placeholders, the values we enter when we ask TensorFlow to run the calculations, respectively.
The input image x
consists of a floating point 2D tensor.
Here we assign it the form [None, 784]. 783 is the number of dimensions of the MNIST image parallel to one row, indicating that the first dimension, None, can be of any size, corresponding to the batch size.
The target output class y_
also consists of a 2D tensor. Each column is a one-hot 10-dimensional vector showing the numbers corresponding to the MNIST image.
The placeholder shape argument is optional, but it allows you to automatically catch bugs that result from tensor shapes that don't match TensorFlow.
Variables
We now define weights W
and bias b
for our model.
We can imagine dealing with these like additional inputs, but TensorFlow has a better way to deal with them. It is Variable.
Variable is a value that exists in the TensorFlow calculation graph.
In machine learning applications, the parameters of the model are usually Variable.
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
We pass initial values for those parameters by calling tf.Variable
.
In this case, we initialize both W
and b
as an all-zero tensor.
W
is a 784x10 matrix (because we have 784 input features and 10 outputs) and b
is a 10-dimensional vector (because we have 10 classes).
Before Variables will be used in a session, they must be initialized by using the session. In this step, we take the initial values we have already clarified (for all zero tensors) and assign them their respective Variables. This can be done for all Variables in one go.
sess.run(tf.initialize_all_variables())
We can now implement our regression model.
It's just one line!
We multiply the vectorized input image x
with the weight matrix W
, add the bias b
, and calculate the softmax probabilities assigned to each class.
y = tf.nn.softmax(tf.matmul(x,W) + b)
The cost function minimized during training is briefly described. Our cost function is the cross entropy between the object and the model prediction.
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
Note that tf.reduce_sum
sums over all images in a mini-batch, not to mention all classes.
We are calculating the cross entropy for the entire mini-batch.
Now that we have defined our model and training cost function, training with TensorFlow is simple. Because TensorFlow knows all computational graphs and uses auto-identification to find the cost gradient for each variable. TensorFlow has a variety of built-in optimization algorithms (https://www.tensorflow.org/versions/master/api_docs/python/train.html#optimizers). For example, we take steps of 0.01 length and use the steepest gradient descent to descend the cross entropy.
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
What TensorFlow actually did in this one line was to add new processing to the computational graph. The process, including those that calculate these gradients, calculates the parameter update steps and applies the updated steps to the parameters.
The returned process train_step
applies gradient descent to update the parameters when it is executed.
Model training will therefore be proficient by performing train_step
many times.
for i in range(1000):
batch = mnist.train.next_batch(50)
train_step.run(feed_dict={x: batch[0], y_: batch[1]})
In each training iteration, we read 50 training samples.
We then use feed_dict
to replace the x
and _y
of the placeholder
tensor with the training sample and perform the train_step
process.
Note that you can use feed_dict
to swap any tensor in Atana's computational graph. Not just limited to placeholder
.
How to improve our model?
First we calculate where we predicted the correct label.
tf.argmax
is a very useful function that gives the highest input index of a tensor along an axis.
For example, tf.argmax (_y, 1)
is the true label, while tf.argmax (y, 1)
is the most similar label for each input we think of.
We can use tf.equal
to see if our predictions match the truth.
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
This gives a list of booleans.
To determine if the part is correct, we cast it to floating point and then take the average.
For example, [True, False, True, True]
becomes [1,0,1,1]
and becomes 0.75
.
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
Finally, we evaluate our accuracy with test data. This should be about 91% correct.
print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
It's not good to get about 91% accuracy with MNIST. It's almost confusingly bad. In this section, we fix it and jump from a very simple model to something reasonably sophisticated, a small convolutional neural network. This gives about 99.2% accuracy. Not cutting-edge, but straightforward.
To create this model, we need to create a lot of weights and biases. One generally initializes the weights with a small amount of noise due to symmetry breaking, preventing the gradient from becoming zero. Since we used Rectifier (ReLU) neurons, it is a good practice to initialize them with an initial bias that is slightly positive to prevent dead neurons. Instead of repeating this while we build the model, let's create two easy-to-use functions that do it for us.
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
Convolution and pooling TensorFlow also gives us a lot of flexibility in convolution and partial sampling. How to handle boundaries well? What is the size of our stride? In this example, we always choose the mediocre version. Our convolution uses one stride and is padded to 0 so that the output is the same size as the input. Our partial sampling is an old maximum partial sampling (layer) plan that exceeds 2x2 blocks. To keep our code clean, let's make these abstract processes functions.
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
We can now implement our first layer.
It consists of convolution and occurs after max-pooling.
Convolution calculates 32 features for a 5x5 patch.
Its weight tensor has the form [5, 5, 1, 32]
.
The first two dimensions are the patch size, then the number of input channels, and finally the output channels.
We also have a bias vector with elements for each output channel.
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
To apply to layers, we first translate x
into a 4D tensor with the second and third dimensions corresponding to the width and height of the image and the last dimension corresponding to the number of color channels. Remake.
x_image = tf.reshape(x, [-1,28,28,1])
We then make a convolution of the x_image
and the weight tensor, bias it, apply the ReLU function, and finally calculate the max pool.
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
Instead of building a deep network, we stack several layers of this type. The second layer has 64 features for each 5x5 patch.
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
Now that the image size has been reduced to 7x7, we add a fully connected layer with 1024 neurons that allow processing of all images. We transform the tensor from a partial sampling layer to a batch of vectors, multiply it by a weight matrix, bias it, and apply ReLU.
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
Dropout
To reduce over-application, we apply a dropout before the read-out layer.
We make a placeholder
for the probability of keeping the neuron's output during the dropout.
This allows us to drop out during training and quit running during testing.
TensorFlow's tf.nn.dropout
operation automatically scales the neuron output in addition to masking them, and the dropout runs without any additional scaling.
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
Finally, we add a softmax layer as well as one layer of softmax regression above.
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
How good is this model?
For training and evaluation, we use almost the same code as one layer of the simple Softmax network above.
The differences are as follows.
We transform the steepest gradient descent optimization program into a more sophisticated ADAM optimization program.
We include an additional parameter keep_prob
in deed_dict
to control the dropout rate.
And we add logging to every 100 iterations in the training process.
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.initialize_all_variables())
for i in range(20000):
batch = mnist.train.next_batch(50)
if i%100 == 0:
train_accuracy = accuracy.eval(feed_dict={
x:batch[0], y_: batch[1], keep_prob: 1.0})
print("step %d, training accuracy %g"%(i, train_accuracy))
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
print("test accuracy %g"%accuracy.eval(feed_dict={
x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
The accuracy of the final test set after running this code should be about 99.2%.
We learned how to build, train, and evaluate a fairly sophisticated model of deep learning with TensorFlow quickly and easily.
That's all for translation. It's more complicated than the tutorial for beginners, but it's similar in that each layer is weighted, biased, and the function applied. Next, I would like to actually execute the code while understanding this content better.
Recommended Posts