Previously, I tried to read the MNIST code to understand the processing of TensorFlow, but after all it is an image, so I want to CNN. So, I added a convolution process. Of course, the reference is "Deep MNIST for Experts", but here / 02/25/15 3000) was also used as a reference.

Convolution process

Apply an n × n filter. People who are doing image processing are very familiar with it because it has extracted and blurred edges.

The function to be used is "tensorflow.nn.conv2d ()", and there are mainly four arguments to set. The first is input data. The second is the filter (vertical size, horizontal size, number of input channels, number of output channels). The third is a moving stride. The fourth is the padding setting.

Normally, it is set like this.

x = tensorflow.placeholder(tf.float32, [None, 784])
x_image = tensorflow.reshape(x, [-1, 28, 28, 1])
initial = tensorflow.truncated_normal([5, 5, 1, 32], stddev=0.1)
W_conv1 = tensorflow.Variable(initial)
h_conv1 = tensorflow.nn.conv2d(x_image,
                               W_conv1,
                               strides=[1, 1, 1, 1],
                               padding='SAME')

Since the MNIST data is a 28x28 image stretched to one dimension, it is first restored to 28x28. Next, I set a 5x5 filter. Since the input is a black and white image, it has 1 channel and the output is 32 channels.

In addition, during the convolution process, ReLU is used as the activation function, so the last line is actually

initial = tensorflow.constant(0.1, [32])
b_conv1 = tensorflow.Variable(initial)
h_conv1 = tensorflow.nn.relu(tensorflow.nn.conv2d(x_image,
                                                  W_conv1,
                                                  strides=[1, 1, 1, 1],
                                                  padding='SAME')
                             + b_conv1)

It will be like that. ("W_conv1" and "b_conv1" are the weight parameters adjusted by error back propagation)

Pooling process

The pooling process is usually paired with the convolution process. This is an image that reduces the image size. (Determine the representative value of n × n) There are various methods such as average value and median value, but in Deep Learning, the maximum value called "Max Poolong" is often used.

TensorFlow uses "tensorflow.nn.max_pool ()". There are four main arguments to set. The first is input data. The second is a filter. The third is a moving stride. The fourth is the padding setting. It's similar to the convolution process, isn't it?

The actual code looks like this.

h_pool1 = tensorflow.nn.max_pool(h_conv1,
                                 ksize=[1, 2, 2, 1],
                                 strides=[1, 2, 2, 1],
                                 padding='SAME')

The input is the result of the convolution process. The size of the filter is 2x2, and the stride is also moved by 2 pixels vertically and horizontally. This will reduce the size of the finished image to half the original size. (The number of channels does not change)

For full join processing

In addition, since the convolution processing and pooling processing are processed as a two-dimensional image, it is converted to one-dimensional when fully combined. At this time, please note that the size of the array cannot be set unless the image size and the number of channels are properly known. The function uses "tensorflow.reshape ()" in the same way as when changing from 1D to 2D.

bonus

There are some parts that I haven't explained yet, but I'll post the source code that works (should).

Please note that as of March 21, 2017, you cannot access the site where the MNIST dataset is located.

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

#Original functions
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                          strides=[1, 2, 2, 1], padding='SAME')

#Main function
def main():
    #Get dataset
    #(Now, specify the zip folder that has been downloaded in advance)
    mnist = input_data.read_data_sets("MNIST_data", one_hot=True)

    #I / O data preparation
    x = tf.placeholder(tf.float32, [None, 784])
    y_ = tf.placeholder(tf.float32, [None, 10])
    x_image = tf.reshape(x, [-1, 28, 28, 1])

    #Convolution process(1)
    W_conv1 = weight_variable([5, 5, 1, 32])
    b_conv1 = bias_variable([32])
    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)

    #Pooling process(1)
    h_pool1 = max_pool_2x2(h_conv1)

    #Convolution process(2)
    W_conv2 = weight_variable([5, 5, 32, 64])
    b_conv2 = bias_variable([64])
    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)

    #Pooling process(2)
    h_pool2 = max_pool_2x2(h_conv2)

    #Full join processing
    W_fc1 = weight_variable([7 * 7 * 64, 1024])
    b_fc1 = bias_variable([1024])
    h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

    #Drop out
    keep_prob = tf.placeholder(tf.float32)
    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

    #Identification (why there is no softmax is a mystery)
    W_fc2 = weight_variable([1024, 10])
    b_fc2 = bias_variable([10])
    y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

    #Evaluation processing (for some reason there is softmax here)
    cross_entropy = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    #Create session
    sess = tf.InteractiveSession()
    tf.global_variables_initializer().run()

    #training
    for i in range(20000):
        #Batch size is 50
        batch = mnist.train.next_batch(50)
        #Progress display every 100 times
        if i%100 == 0:
            train_accuracy = accuracy.eval(feed_dict={
                x:batch[0], y_: batch[1], keep_prob: 1.0})
            print("step %d, training accuracy %g"%(i, train_accuracy))
        train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

    #Result display
    print("test accuracy %g"%accuracy.eval(feed_dict={
        x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

#Start processing
if __name__ == "__main__":
    main()

After that, I have to write the dropout and evaluation process. .. .. (Evaluation process, I'm not sure ^^;)

Add convolution to MNIST

Convolution process

Pooling process

For full join processing

bonus