[Roughly translate TensorFlow Tutorial into Japanese] 2. Deep MNIST For Experts

Introduction

I will post a memorandum about TensorFlow, a library for deep learning that Google has started to provide. TensorFlow has detailed tutorial explanations, so I tried to translate it into Japanese. ・ About TensorFlow-> http://www.tensorflow.org/ ・ Original translation of this time-> http://www.tensorflow.org/tutorials/mnist/pros/index.md Since I am a genuine Japanese person, there may be some strange translations or mistranslations, so please do so at your own risk. [Reference [Roughly translate TensorFlow Tutorial into Japanese] 1. MNIST For ML begginers]

This Tutorial will be easier to understand if you read Chapter 6 of Mr. Okaya's book "Deep Learning". In this article, technical terms such as stride and padding are also mentioned, so reference materials are essential.

Then, the model created this time is a model called convolutional neural network (CNN).

Read MNIST data

Please refer to the previous article as it seems to be skipped here *

Run the script to get the MNIST dataset.

import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Start TensorFlow Interactive Session

TensorFlow leverages a highly efficient C ++ backend to do its own calculations. This connection to the backend is called a session. A common use of TensorFlow is to first create a graph and then run a session.

But here we use a handy InteractiveSession class-which gives TensorFlow more flexibility in how users write their code. This is useful when working interactively, like iPython. If you did not use InteractiveSession, the user would have to build up the entire computation graph before starting the session and then run the graph.

import tensorflow as tf
sess = tf.Interactive Session()

Computation Graph Instead of executing a single heavy instruction independently outside python, TensorFlow allows you to write a "graph" to execute the entire interacting instruction outside python. Similar techniques are used in Theano and Torch.

So the role of the python code is to build this external computational graph and give instructions on which part of the graph to execute.

Building a Softmax Regression Model

See previous article *

x = tf.placeholder("float", shape=[None, 784])
y_ = tf.placeholder("float", shape=[None, 10])

W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

sess.run(tf.initialize_all_variables())

y = tf.nn.softmax(tf.matmul(x,W) + b)

cross_entropy = -tf.reduce_sum(y_*tf.log(y))

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

for i in range(1000):
  batch = mnist.train.next_batch(50)
  train_step.run(feed_dict={x: batch[0], y_: batch[1]})

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

print accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels})

Building a multi-layer convolutional network

With a MNIST correct answer rate of 91%, it is embarrassingly bad. We will build a small Convolutional neural network and aim for a correct answer rate of 99.2%.

Initialization of Weight

To make a CNN, you need to make a lot of weights and biases. In general, weights should be initialized with a small amount of noise to break symmetry and prevent zero gradients. Since we are using ReLU neurons this time, it is also a good idea to initialize each element with an initial bias with a slight positive value (to avoid "dead neurons"). Instead of repeating this every time you build a model, let's create two convenient functions that will do it for you.

def weight_vairable(shape):
   initial = tf.truncated_normal(shape, stddev=0.1)
   return tf.Variable(initial)


def bias_variable(shape):
   initial = tf.variable(0.1, shape=shape):
   return tf.Variable(initial)

Convolution and pooling

TensorFlow gives you a great deal of flexibility in convolution and pooling. How to handle boundaries? How long should the slide be? Let's use the most mundane one in this example. This CNN takes a stride of 1 and uses zero padding, so the output is the same size as the input. Pooling is plain old max pooling that spans 2 * 2 blocks. To keep your code clean, let's combine these operations into a function.

def conv2d(x,W):
   return tf.nn.conv2d(x,W,strides=[1,1,1,1], padding='SAME')

def max_pool_2x2(x):
   return tf.nn.max_pool(x, ksize=[1,2,2,1], strides=[1,2,2,1],padding='SAME')

1st layer convolution layer

Already we can carry out the first tier. First there is the convolution, then the max pooling. Convolution calculates 32 features for each 5x5 patch (* Does that mean it returns 32 numbers for a 5x5 patch?). The shape of the weight tensor is [5,5,1,32]. The first two dimensions are the patch size, then the number of input channels, and finally the number of output channels. Also prepare a bias vector. Each element of that vector is given to each output channel.

w_conv1 = weight_variable([5,5,1,32])
b_conv1 = bias_variable([32])

To fit this layer, we first shape the x vector into a four-dimensional tensor. The second and third dimensions correspond to the width and height of the image, and the fourth dimension corresponds to the color channels.

x_image = tf.reshape(x,[-1,28,28,1])

Now let's convolve x_image with a weight tensor, add bias, assign it to the ReLU function, and get the max pool.

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1)+b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

2nd convolution layer

The second layer takes 64 features for a 5x5 patch.

W_conv2 = weight_variable([5,5,32,64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2)+b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

Fully connected layer

The image size has now dropped to 7x7. Add a fully connected layer with 1024 neuron elements to this and let the entire image be processed. Tensor from the pooling layer Convert the quantity to a batch of vectors, multiply it by the weight matrix, add the bias, and assign it to the ReLU.

W_fc1 = weight_variable([7*7*64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1)+t_fc1)

Drop out

Apply a dropout before the read layer to avoid overfitting problems. Use a probability distribution to store the output of the neuron during the dropout and provide a placeholder for that probability distribution. (I don't understand the meaning well) This allows you to turn the dropout on during training and off during testing. In addition to masking the neuron output, TensorFlow's tf.nn.dropout instruction also automatically scales (..?) This allows the dropout to just work without additional scaling. (.. ??)

keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

Read layer

Finally, add the Softmax layer.

W_fc2 = weight_variable([1024,10])
b_fc2 = bias_variable([10])

y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2)+b_fc2)

Training and model evaluation

The differences from the above-mentioned single-layer Softmax network are as follows. -Replaces the Steepest Gradient Optimizer with a more sophisticated ADAM optimizer. · An additional parameter keep_prob has been added to feed_dict to manipulate the dropout rate. ・ I try to keep a log every 100 training sessions.

cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.initialize_all_variables())
for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i%100 == 0:
    train_accuracy = accuracy.eval(feed_dict={
        x:batch[0], y_: batch[1], keep_prob: 1.0})
    print "step %d, training accuracy %g"%(i, train_accuracy)
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print "test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})