TensorFlow Tutorial-Convolutional Neural Network (Translation)

TensorFlow Tutorial (Convolutional Neural Networks) http://www.tensorflow.org/tutorials/deep_cnn/index.html#convolutional-neural-networks It is a translation of. We look forward to pointing out any translation errors.


Note: This tutorial is intended for advanced TensorFlow users and assumes machine learning expertise and experience.

Overview

CIFAR-10 classification is a common benchmarking problem in machine learning. The problem is to classify RGB 32x32 pixel images into 10 categories: planes, cars, birds, cats, deer, dogs, frogs, horses, boats, and trucks.

図

For more information, see CIFAR-10 page and [Technical Report] by Alex Krizhevsky (http://www.cs.toronto.edu/) See% 7Ekriz / learning-features-2009-TR.pdf).

Target

The goal of this tutorial is to build a relatively small convolutional neural network (CNN) for image recognition. In the steps of this tutorial:

  1. Highlight standard configurations for network architecture, training and evaluation.
  2. Provides templates for building larger and more sophisticated models.

I chose CIFAR-10 because it's complex enough to practice much of TensorFlow's ability to scale to large models. At the same time, the model is small enough to train fast. Therefore, it is ideal for trying out new ideas and experimenting with new technologies.

Tutorial highlights

This CIFAR-10 tutorial shows some important structures for designing larger and more sophisticated models in TensorFlow:

It also provides a multi-GPU version model that describes:

We hope this tutorial will be a starting point for building larger CNNs for image tasks in TensorFlow.

Model architecture

The model in this CIFAR-10 tutorial is a multi-tiered architecture with alternating convolutions and non-linearities. These layers are followed by a fully connected layer that leads to the Softmax classifier. The model follows the architecture described by Alex Krizhevsky, with the exception of some differences in the top layers.

This model has been trained on the GPU to achieve maximum performance with an accuracy of approximately 86% within hours. For more information, see [below](# model evaluation) and code. It consists of 1,068,298 learnable parameters and requires approximately 19.5 million multiplications and additions to calculate a single image inference.

Code structure

The code for this tutorial can be found at tensorflow / models / image / cifar10 /.

File Purpose
cifar10_input.py CIFAR-10 Read binary file format
cifar10.py CIFAR-Build 10 models
cifar10_train.py CIFAR on CPU or GPU-Train 10 models
cifar10_multi_gpu_train.py CIFAR on multiple GPUs-Train 10 models
cifar10_eval.py CIFAR-Evaluate the predictive performance of 10 models

CIFAR-10 model

The CIFAR-10 network is primarily contained in cifar10.py. The complete training graph contains approximately 765 operations. You can see that the code can be made almost reusable by building the graph with the following modules:

  1. Model Inputs (#Model Inputs): inputs () and distorted_inputs () read and preprocess CIFAR images for evaluation and training.
  2. Model Prediction (#Model Prediction): inference () adds an operation to perform inference, or classify, on the specified image.
  3. Model Training (#Model Training): loss () and train () add operations to calculate losses, gradients, variable updates, and visualization summaries.

Model input

The input part of the model is built with the functions inputs () and distorted_inputs () that read the image from the CIFAR-10 binary data file. These files contain fixed byte length records, so I'm using tf.FixedLengthRecordReader. For more information on the Reader class, see Reading Data (https://www.tensorflow.org/how_tos/reading_data/index.html#reading-from-files).

The image is processed as follows:

In training, we apply a series of random distortions to artificially increase the dataset:

See the Image (https://www.tensorflow.org/api_docs/python/image.html) page for a list of available distortions. Attach image_summary to your images so that you can visualize them in TensorBoard. We recommend that you make sure that the input data is created correctly.

図

The process of reading an image from a disc and distorting it may take some time. To prevent this operation from slowing down training, this in 16 separate threads that continuously fill the TensorFlow Queue (https://www.tensorflow.org/api_docs/python/io_ops.html#shuffle_batch) To execute.

Model prediction

The forecasting part of the model consists of the inference () function, which adds an operation to calculate the logit of the forecast. This part of the model is constructed as follows:

Layer name Description
conv1 ConvolutionWhenReLUactivation
pool1 Maximum pooling
norm1 Local response normalization
conv2 ConvolutionWhenReLUactivation
norm2 Local response normalization
pool2 Maximum pooling
local3 Fully connected layer with ReLU activation
local4 Fully connected layer with ReLU activation
softmax_linear Linear transformation to generate logit

The graph of inference operations generated by TensorBoard is as follows:

図

Exercise: The inference output is a non-normalized logit. Please edit the network architecture and use tf.softmax () to return the normalized predictions. ..

The inputs () and inference () functions provide all the components needed to evaluate a model. Now let's shift our focus to building operations that train the model.

Exercise: The model architecture of inference () is slightly different from the CIFAR-10 model defined in cuda-convnet. Specifically, the top layer of Alex's original model is a partial join, not a full join. Try editing the architecture so that the top layer is partially joined.

Model training

The usual way to train a network for N-class classification is Multinomial Logistic Regression (https://en.wikipedia.org/wiki/Multinomial_logistic_regression), also known as Softmax Regression. Softmax regression applies Softmax non-linearity to the output of the network, with normalized predictions and labels 1-Hot Coding, [Cross Entropy](https://www.tensorflow.org/api_docs/python/nn. html # softmax_cross_entropy_with_logits) is calculated. It also applies the usual weight decay (https://www.tensorflow.org/api_docs/python/nn.html#l2_loss) loss to all trained variables for regularization. The objective function of the model returned by the loss () function is the sum of the cross entropy loss and all these weight attenuation terms.

Visualize this with TensorBoard using scalar_summary:

図

Standard Gradient Descent (https://en.wikipedia.org/wiki/Gradient_descent) Algorithm (Training for Other Methods (https://www.tensorflow.org/api_docs/python/train.) (See html)) to train the model with a learning rate that exponentially decays over time. To do.

図

The train () function calculates the gradient and updates the training variables (see GradientDescentOptimizer for more information). Add the operations required to minimize the objective function. This function returns an operation that performs all the calculations needed to train and update the model for a batch of images.

Model launch and training

Building the model is now complete. Let's start the model and perform the training operation with the script cifar10_train.py.

python cifar10_train.py

Note: The CIFAR-10 dataset is automatically downloaded the first time you run any target of the CIFAR-10 tutorial. The dataset is about 160MB, so drink coffee the first time you run it.

You should see something like this:

Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
2015-11-04 11:45:45.927302: step 0, loss = 4.68 (2.0 examples/sec; 64.221 sec/batch)
2015-11-04 11:45:49.133065: step 10, loss = 4.66 (533.8 examples/sec; 0.240 sec/batch)
2015-11-04 11:45:51.397710: step 20, loss = 4.64 (597.4 examples/sec; 0.214 sec/batch)
2015-11-04 11:45:54.446850: step 30, loss = 4.62 (391.0 examples/sec; 0.327 sec/batch)
2015-11-04 11:45:57.152676: step 40, loss = 4.61 (430.2 examples/sec; 0.298 sec/batch)
2015-11-04 11:46:00.437717: step 50, loss = 4.59 (406.4 examples/sec; 0.315 sec/batch)
...

The script not only reports the total loss every 10 steps, but also the speed at which the last batch of data was processed. Some comments:

Exercises: During an experiment, you may often find it annoying that the first training step takes too long. Try reducing the number of images that fill the queue first. Search for NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN in cifar10.py.

cifar10_train.py periodically saves all model parameters to the Checkpoint File (https://www.tensorflow.org/how_tos/variables/index.html#saving-and-restoring) (https) (: //www.tensorflow.org/api_docs/python/state_ops.html#Saver), but does not evaluate the model. The checkpoint file is used by cifar10_eval.py to measure predictive performance (see Model Evaluation below (#Model Evaluation)).

Performing the above steps will start training the CIFAR-10 model. Congrats!

The terminal text returned by cifar10_train.py provides minimal insight into how the model is trained. I want to get more model insights during training:

TensorBoard is from cifar10_train.py [SummaryWriter](https://www.tensorflow.org/api_docs/python/train.html#SummaryWriter It provides this functionality by displaying the data that is periodically exported via).

For example, you can see how the distribution of activation and sparseness in the local3 feature evolves during training:

図図

It is especially interesting to track individual loss functions over time, as well as total losses. However, the loss shows a significant amount of noise due to the small batch size used for training. In practice, it is very useful to visualize the moving average in addition to the raw values. See how the script uses ExponentialMovingAverage (https://www.tensorflow.org/api_docs/python/train.html#ExponentialMovingAverage) for this purpose.

Model evaluation

Now let's evaluate how well the trained model works with the provided dataset. The model is evaluated by the script cifar10_eval.py. It builds a model using the inference () function and uses all 10,000 images in the CIFAR-10 evaluation set. It calculates an accuracy of 1, that is, how often the top predictions match the true label of the image.

An evaluation script is periodically run against the latest checkpoint file created by cifar10_train.py to monitor the improvement of the model during training.

python cifar10_eval.py

Be careful not to run the evaluation and training binaries on the same GPU. Otherwise you will run out of memory. Consider running the evaluation on a different GPU if available, or pausing the training binaries on the same GPU.

You should see something like this:

2015-11-06 08:30:44.391206: precision @ 1 = 0.860
...

The script simply returns precision @ 1 on a regular basis, in this case 86% precision. cifar10_eval.py also exports a summary that can be visualized in TensorBoard. These summaries provide additional insight into the model during evaluation.

The training script calculates the Moving Average (https://www.tensorflow.org/api_docs/python/train.html#ExponentialMovingAverage) version of all trained variables. The evaluation script replaces the trained model parameters with the moving average version. This replacement improves the performance of the model during evaluation.

Exercise: You can improve the predictive performance measured by accuracy @ 1 by about 3% by using the averaging parameters of the model. Edit cifar10_eval.py so that it does not use the averaging parameter and check that the prediction performance is degraded.

Model training with multiple GPU cards

Current workstations may contain multiple GPUs for scientific computing. TensorFlow can take advantage of this environment by performing training operations across multiple cards at the same time.

Training in parallel and distributed models requires coordination of the training process. For later, one copy of the model trained by a subset of the data is called a model replica.

If you simply adopt asynchronous update of model parameters Individual model replicas can be trained with older copies of model parameters, resulting in less than best training performance. Conversely, if you adopt a fully synchronous update, it will be as slow as the slowest model replica.

On workstations with multiple GPU cards, each GPU has similar speeds and contains enough memory to run all CIFAR-10 models. Therefore, we will design the training system as follows:

The diagram for this model is below:

図

Note that each GPU calculates a unique batch data gradient as well as an estimate. This setting allows you to efficiently split large batches of data between GPUs.

This setting requires all GPUs to share model parameters. As is well known, data transfer to and from the GPU is very slow. For this reason, we decided to store and update all model parameters in the CPU (see green box). After a new batch of data has been processed by all GPUs, a new model parameter set is transferred to the GPUs.

GPUs are synchronized in operation. All gradients are accumulated and averaged from the GPU (see green box). Model parameters are updated with the average gradient of all model replicas.

Placement of variables and operations on the device

Placing operations and variables on the device requires some special abstraction.

The first abstraction we need is a function for inferring and calculating gradients for a single model replica. In your code, this abstraction is called a "tower". You need to set two attributes for each tower.

All variables are fixed to the CPU and can be shared by multiple GPU versions via tf.get_variable () To access. See how-tos in Shared Variables (https://www.tensorflow.org/how_tos/variable_scope/index.html).

Model launch and training on multiple GPU cards

If you have multiple GPU cards and they are installed on your machine, you can use them with the cifar10_multi_gpu_train.py script to train your model faster. This version of the training script parallelizes the model across multiple GPU cards.

python cifar10_multi_gpu_train.py --num_gpus=2

The output of the training script should look like this:

Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
2015-11-04 11:45:45.927302: step 0, loss = 4.68 (2.0 examples/sec; 64.221 sec/batch)
2015-11-04 11:45:49.133065: step 10, loss = 4.66 (533.8 examples/sec; 0.240 sec/batch)
2015-11-04 11:45:51.397710: step 20, loss = 4.64 (597.4 examples/sec; 0.214 sec/batch)
2015-11-04 11:45:54.446850: step 30, loss = 4.62 (391.0 examples/sec; 0.327 sec/batch)
2015-11-04 11:45:57.152676: step 40, loss = 4.61 (430.2 examples/sec; 0.298 sec/batch)
2015-11-04 11:46:00.437717: step 50, loss = 4.59 (406.4 examples/sec; 0.315 sec/batch)
...

Note that the number of GPU cards is 1 by default. In addition, if your machine has only one GPU available, all calculations will be placed on that single GPU, even if you request more.

Exercise: By default, cifar10_train.py runs with a batch size of 128. Run cifar10_multi_gpu_train.py on two GPUs with batch size 64 and compare training speeds.

Next step

Congrats! You have completed the CIFAR-10 tutorial.

If you are interested in developing and training your own image classification system, we recommend that you fork this tutorial and replace the components to address that image classification problem.

Exercise: Download the Street View House Numbers (SVHN) (http://ufldl.stanford.edu/housenumbers/) dataset. Fork the CIFAR-10 tutorial and exchange the input data for SVHN. Try modifying your network architecture to improve predictive performance.

Recommended Posts

TensorFlow Tutorial-Convolutional Neural Network (Translation)
Parametric Neural Network
TensorFlow Tutorial-Mandelbrot Set (Translation)
TensorFlow Tutorial-TensorFlow Mechanics 101 (Translation)
Implement Convolutional Neural Network
Implement Neural Network from 1
TensorFlow Tutorial-Image Recognition (Translation)
Implement a 3-layer neural network
TensorFlow Tutorial-MNIST Data Download (Translation)
Neural network with Python (scikit-learn)
TensorFlow Tutorial-Sequence Transformation Model (Translation)
TensorFlow Tutorial-Partial Differential Equations (Translation)
Neural network starting with Chainer
Neural network implementation in python
Pytorch Neural Network (CNN) Tutorial 1.3.1.
4. Circle parameters with neural network!
Neural network implementation (NumPy only)
Simple neural network implementation using Chainer
TensorFlow MNIST For ML Beginners Translation
Neural network with OpenCV 3 and Python 3
Implementation of a two-layer neural network 2
PRML Chapter 5 Neural Network Python Implementation
Simple classification model with neural network
I implemented a two-layer neural network
Simple neural network theory and implementation
Touch the object of the neural network
[Language processing 100 knocks 2020] Chapter 8: Neural network
TensorFlow Tutorial-Vector Representation of Words (Translation)
TensorFlow Deep MNIST for Experts Translation
Build a classifier with a handwriting recognition rate of 99.2% with a TensorFlow convolutional neural network