TensorFlow Tutorial (TensorFlow Mechanics 101) https://www.tensorflow.org/versions/master/tutorials/mnist/tf/index.html#tensorflow-mechanics-101 It is a translation of. We look forward to pointing out any translation errors.
Code: tensorflow / examples / tutorials / mnist /
The goal of this tutorial is to show you how to use TensorFlow to train and evaluate a simple feedforward neural network for classifying handwritten numbers using a (classical) MNIST data set. The target audience for this tutorial is those with machine learning experience who are interested in using TensorFlow.
These tutorials are not meant to teach machine learning in general.
Make sure you follow the steps in Install TensorFlow (http://www.tensorflow.org/get_started/os_setup.html).
In this tutorial, we will refer to the following files:
File | Purpose |
---|---|
mnist.py | Fully coupled MNIST model building code. |
fully_connected_feed.py | The main code that trains the MNIST model built using the feed dictionary on the downloaded dataset. |
To start training, run the fully_connected_feed.py file directly:
python fully_connected_feed.py
MNIST is a classic problem in machine learning. The problem is to look at a grayscale 28x28 pixel image of handwritten digits and determine which of the numbers 0 to 9 the image represents.
For more information, see the Yann LeCun MNIST page (http://yann.lecun.com/exdb/mnist/) or Chris Olah's MNIST visualization (http://colah.github.io/posts/). See 2014-10-Visualizing-MNIST /).
At the beginning of the run_training () method, the input_data.read_data_sets () function verifies that the correct data has been downloaded to the local training folder, unzips the data and returns a dictionary of DataSet instances.
data_sets = input_data.read_data_sets(FLAGS.train_dir, FLAGS.fake_data)
Note: The fake_data flag is used for unit testing and can be ignored here.
data set | Purpose |
---|---|
data_sets.train | 55,000 images and labels for main training. |
data_sets.validation | 5000 images and labels for iterative verification of training accuracy. |
data_sets.test | 10000 images and labels for the final test of training accuracy. |
For more information on the data, read the tutorial at Download (http://www.tensorflow.org/tutorials/mnist/download/index.html).
The placeholder_inputs () function defines two tf.placeholder that define the shape of the input, such as batch_size, in the rest of the graph. ) Create an operation. The actual training sample is fed in it.
images_placeholder = tf.placeholder(tf.float32, shape=(batch_size,
IMAGE_PIXELS))
labels_placeholder = tf.placeholder(tf.int32, shape=(batch_size))
In the training loop below, all image and label datasets are sliced to fit batch_size at each step, fitted to these placeholders, and then sess.run () using the feed_dict parameter. Passed to the function.
After creating the data placeholders, the graph is constructed from the mnist.py file according to a three-step pattern: inference (), loss (), training ().
The inference () function creates a graph as needed to return a tensor containing the predicted output.
This function takes an image placeholder as input. Then, first, we build a fully connected layer with ReLu activation, followed by a 10-node linear layer that identifies the output logit.
Each layer is created under a unique tf.name_scope. tf.name_scope acts as a prefix to items created within that scope.
with tf.name_scope('hidden1'):
Within the defined scope, the weights and biases used at each layer are sought as tf.Variable instances. Generated in shape:
weights = tf.Variable(
tf.truncated_normal([IMAGE_PIXELS, hidden1_units],
stddev=1.0 / math.sqrt(float(IMAGE_PIXELS))),
name='weights')
biases = tf.Variable(tf.zeros([hidden1_units]),
name='biases')
As an example, if created within the hidden1 scope, the unique name given to the weights variable would be "hidden1 / weights".
Each variable is given an initialization operation as part of its construction.
In the most common cases like the one above, the weights are initialized with tf.truncated_normal to give the shape of the 2D tensor. hold. The first dimension is the number of units in the layer from which the weights are combined, and the second dimension is the number of units in the layer to which the weights are combined. In the first layer named hidden1, the weight is [IMAGE_PIXELS, hidden1_units] because it combines the image input with the hidden1 layer. The tf.truncated_normal initializer produces a random distribution with a given mean and standard deviation.
And the bias is initialized with tf.zeros to ensure that it all starts with zero values. Its shape is simply the number of units in the layer it is bound to.
Three main operations on the graph-wrapping two [tf.matmul] for hidden layers (http://www.tensorflow.org/api_docs/python/math_ops.html#matmul) tf.nn.relu and tf.matmul for the logit, in turn, tf connected with the input placeholder or the output tensor of each layer. Created with a .Variable instance.
hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases)
hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)
logits = tf.matmul(hidden2, weights) + biases
Finally, it returns a logit tensor containing the output.
The loss () function further builds the graph by adding the required loss operations.
First, the value from labels_placeholder is converted to a 64-bit integer. It then automatically creates 1-hot labels from labels_placeholder and compares the output logit from the inference () function with these 1-hot labels [tf.nn.sparse_softmax_cross_entropy_with_logits](https://www. tensorflow.org/versions/master/api_docs/python/nn.html#sparse_softmax_cross_entropy_with_logits) Operation is added.
labels = tf.to_int64(labels)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
logits, labels, name='xentropy')
Then, as a total loss, tf.reduce_mean to get the mean of the cross entropy values along the batch dimension (first dimension). ) Use the.
loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')
It then returns a tensor containing the loss value.
Note: Cross entropy is an information-theoretic idea that shows how bad it is to believe neural network predictions for a given truth. See the Visual Information Theorem blog post for more information. (Http://colah.github.io/posts/2015-09-Visual-Information/ )
The training () function adds the operations needed to minimize the loss due to gradient descent.
First, take the loss tensor from the loss () function and pass it to tf.scalar_summary. This operation, when used with SummaryWriter (see below), produces summary values in the event file. Here we publish a snapshot of the loss value each time the summary is exported.
tf.scalar_summary(loss.op.name, loss)
Next, instantiate the tf.train.GradientDescentOptimizer. tf.train.GradientDescentOptimizer applies the gradient according to the requested learning rate.
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
It then creates a variable that contains the counters for the global training steps and uses the minimize () operation. .. The minimize () operation updates the system's trainable weights and increments the steps. By convention, this operation is known as train_op and is performed by a TensorFlow session to trigger one full step of training (see below).
global_step = tf.Variable(0, name='global_step', trainable=False)
train_op = optimizer.minimize(loss, global_step=global_step)
Returns a tensor containing the output of the training operation.
Once the graph is built, it can be iteratively trained and evaluated within a loop controlled by user code in fully_connected_feed.py.
The run_training () function begins with the python with command, which is a global tf.Graph that defaults to all constructed operations. #Graph) Indicates that it will be associated with an instance.
with tf.Graph().as_default():
tf.Graph is a collection of operations that can be performed together as a group. Most of the time you use TensorFlow, you only rely on a single default graph.
More complex usage with multiple graphs is possible, but is beyond the scope of this simple tutorial.
Once you're all ready to build and generate all the operations you need, create a tf.Session to run the graph. I will.
sess = tf.Session()
Alternatively, generate it in a with block for scope:
with tf.Session() as sess:
An empty Session argument indicates that you want to attach (or create) to the default local session if it has not been created.
Immediately after the session is created, all tf.Variable instances will each be called by calling sess.run (). It is initialized by the initialization operation of.
init = tf.initialize_all_variables()
sess.run(init)
The sess.run () method executes a complete subset of the graph corresponding to the operation passed as a parameter. I will. In this first call, the init operation is tf.group, which contains only variable initializers. None of the rest of the graph is run here, it is done in the training loop below.
After initializing the variables in the session, you can start training.
The user code controls the training step by step. Below is the simplest loop to do useful training:
for step in xrange(FLAGS.max_steps):
sess.run(train_op)
However, the loop in this tutorial is a bit complicated. This is because at each step you need to slice the input data to fit the previously generated placeholders.
Create a feed dictionary at each step. This dictionary contains the sample set used in the training step and is keyed by the placeholders that represent it.
The fill_feed_dict () function queries the given DataSet to get the next set of batch_size images and labels. The tensor that fits the placeholder is filled with the following image or label.
images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size,
FLAGS.fake_data)
Creates a python dictionary object with the placeholder as the key and the corresponding feed tensor as the value.
feed_dict = {
images_placeholder: images_feed,
labels_placeholder: labels_feed,
}
This dictionary is passed to the feed_dict parameter of the sess.run () function to provide an input sample of the training steps.
The code specifies two values to fetch in the run call: [train_op, loss].
for step in xrange(FLAGS.max_steps):
feed_dict = fill_feed_dict(data_sets.train,
images_placeholder,
labels_placeholder)
_, loss_value = sess.run([train_op, loss],
feed_dict=feed_dict)
Since there are two values to fetch, sess.run () returns a tuple with two items. Each tensor in the fetched list of values that corresponds to the numpy array in the returned tuple is filled with the value of that tensor during this training step. Since train_op is an operation with no output value, the corresponding element of the returned tuple is None, so it is discarded. However, the value of the loss tensor will be NaN if the model diverges during training, so capture it for logging.
Even if the training is successful without becoming NaN, the training loop will output a brief status text every 100 steps to inform the user of the training status.
if step % 100 == 0:
print 'Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration)
All summaries (in this case, only one) are graphs because they publish the event files used by TensorBoard. Collected into a single operation during the build phase of.
summary_op = tf.merge_all_summaries()
Then, after the session is created, you can instantiate tf.train.SummaryWriter to write an event file that contains the graph itself and the summary values.
summary_writer = tf.train.SummaryWriter(FLAGS.train_dir, sess.graph)
Finally, the event file is updated with the new summary value each time summary_op is run, and the output is passed to the writer's add_summary () function.
summary_str = sess.run(summary_op, feed_dict=feed_dict)
summary_writer.add_summary(summary_str, step)
Once the event file is written, you can run TensorBoard on the training folder to display the summary values.
Note: For more information on how to build and run Tensorboard, see the included tutorial Tensorboard: Training Visualization (http://www.tensorflow.org/how_tos/summaries_and_tensorboard/index.html).
Tf.train.Saver to publish a checkpoint file that can be used to restore the model for additional training and evaluation. #Saver) is instantiated.
saver = tf.train.Saver()
The training loop periodically calls the saver.save () method to write a checkpoint file to the training directory using the current values of all trainable variables.
saver.save(sess, FLAGS.train_dir, global_step=step)
In the future, you can resume training using the saver.restore () method, which reloads the model parameters.
saver.restore(sess, FLAGS.train_dir)
The code attempts to evaluate the model for each training and test dataset every 1000 steps. The do_eval () function is called three times for the training, validation, and testing datasets.
print 'Training Data Eval:'
do_eval(sess,
eval_correct,
images_placeholder,
labels_placeholder,
data_sets.train)
print 'Validation Data Eval:'
do_eval(sess,
eval_correct,
images_placeholder,
labels_placeholder,
data_sets.validation)
print 'Test Data Eval:'
do_eval(sess,
eval_correct,
images_placeholder,
labels_placeholder,
data_sets.test)
Note that evaluations for data_sets.test are usually quarantined for more complex usage. This is because data_sets.test is only checked after a significant amount of hyper tuning. Evaluate all data because it is a simple and small MNIST problem.
Before we can enter the training loop, we need to build an Eval operation. To build an Eval operation, call the evaluation () function in mnist.py with the same logit / label parameters as the loss () function.
eval_correct = mnist.evaluation(logits, labels_placeholder)
The evaluation () function simply generates the tf.nn.in_top_k operation. tf.nn.in_top_k considers the output of each model to be correct and automatically scores if the true label is included in the top K predictions of the possibility. We set the value of K to 1 because we only consider it correct if the prediction matches the true label.
eval_correct = tf.nn.in_top_k(logits, labels, 1)
You can now create a loop. The loop fills feed_dict and calls sess.run () for the eval_correct operation. The eval_correct operation evaluates the model for a given dataset.
for step in xrange(steps_per_epoch):
feed_dict = fill_feed_dict(data_set,
images_placeholder,
labels_placeholder)
true_count += sess.run(eval_correct, feed_dict=feed_dict)
The true_count variable simply aggregates all predictions that the in_top_k operation correctly determines. Accuracy can be calculated simply by dividing this by the total number of samples.
precision = true_count / num_examples
print(' Num examples: %d Num correct: %d Precision @ 1: %0.04f' %
(num_examples, true_count, precision))
Recommended Posts