I tried a convolutional neural network (CNN) with a tutorial on TensorFlow on Cloud9-Classification of handwritten images-

Introduction

I tried TensorFlow tutorial (MNIST for beginners) in Cloud9 ~ Classification of handwritten images ~, I tried to implement simple machine learning I did. Next, I tried TensorFlow's "Deep MNIST for Experts". Since it is written for professionals, various methods are used, but it is easier to understand as you write the code. Here we will implement a convolutional neural network (CNN) that has a proven track record when classifying images.

environment

The environment is the same as last time. Cloud9 Python 2.7.6 Sample Codes : GitHub Environment construction is "Use TensorFlow in cloud integrated development environment Cloud9 ~ GetStarted ~" The basic usage of TesorFlow is "Use TensorFlow in cloud integrated development environment Cloud9-Basics of usage-" See

code

Same as last time, it is divided into two parts, the learning processing code and the handwritten data prediction code. First, let's look at the code of the learning process.

nist_neural_train.py


from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

# Define method
def weight_variable(shape, name):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial, name=name)

def bias_variable(shape, name):
  initial= tf.constant(0.1, shape=shape)
  return tf.Variable(initial, name=name)

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

# Download gz files to MNIST_data directory
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

# Initializing
sess = tf.InteractiveSession()

x = tf.placeholder(tf.float32, shape=[None, 28*28])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
x_image = tf.reshape(x, [-1, 28, 28, 1])

W_conv1 = weight_variable([5, 5, 1, 32], name="W_conv1")
b_conv1 = bias_variable([32], name="b_conv1")
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

W_conv2 = weight_variable([5, 5, 32, 64], name="W_conv2")
b_conv2 = bias_variable([64], name="b_conv2")
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

W_fc1 = weight_variable([7 * 7 * 64, 1024], name="W_fc1")
b_fc1 = bias_variable([1024], name="b_fc1")
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

W_fc2 = weight_variable([1024, 10], name="W_fc2")
b_fc2 = bias_variable([10], name="b_fc2")
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

# Making model
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_conv, y_))

# Training
train_step = tf.train.GradientDescentOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.initialize_all_variables())
for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i%100 == 0:
    train_accuracy = accuracy.eval(feed_dict={x:batch[0], y_:batch[1], keep_prob:1.0})
    print("step %d, training accuracy %g" %(i, train_accuracy))
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

# Evaluating
#print("test accuracy %g" %accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

# Save train data
saver = tf.train.Saver()
saver.save(sess, 'param/neural.param')

The basic flow is the same as last time, but I will explain the difference.

def weight_variable(shape, name):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial, name=name)

def bias_variable(shape, name):
  initial= tf.constant(0.1, shape=shape)
  return tf.Variable(initial, name=name)

Definition of parameter initialization process. Last time, it was actually initialized with 0, but it seems better not to be 0. I am using a function called ReLU, which is a special function with a slope of 1 when it is a positive value and 0 when it is a negative value, and for that reason, those who have a small positive value as the initial value. Seems to be good.

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

Convolution (conv2d) and pooling (max_pool_2x2), which are the names of convolutional neural networks. The detailed processing is explained below.

Convolution layer

W_conv1 = weight_variable([5, 5, 1, 32], name="W_conv1")
b_conv1 = bias_variable([32], name="b_conv1")
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

It is a convolution layer. Of the [5, 5, 1, 32] of W_conv1, 5 * 5 represents the patch. A layer for capturing the characteristics of an image, which takes out a part (5 * 5) of the image and calculates the match status with that part for the image. By doing so, you can capture the features of the image. 1 is an input and represents one image data to be input. 32 is the output, and 32 types of patches will capture the characteristics of the image data.

The relu of h_conv1 is the ReLU function. If it is a negative value, it is 0, if it is a positive value, the slope is 1, that is, it is a function that uses the input value as it is. Since the slope is 1, it prevents the situation where the slope becomes too small or 0 and learning becomes impossible.

h_pool1 is called the pooling layer, and it is a process that aggregates the maximum value of 4 squares of 2 * 2 in the image into 1 square. The image was originally 28 * 28, which is a quarter of 14 * 14. Even if the image data deviates a little, it will be different data, but I think that the deviation is absorbed by aggregating by pooling. (I apologize if I'm wrong)

W_conv2 = weight_variable([5, 5, 32, 64], name="W_conv2")
b_conv2 = bias_variable([64], name="b_conv2")
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

W_fc1 = weight_variable([7 * 7 * 64, 1024], name="W_fc1")
b_fc1 = bias_variable([1024], name="b_fc1")
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

conv2 just re-implements the convolution layer described above. fc1 is just the same calculation as last time. However, at h_pool2_flat, the multidimensional array is converted to one dimension so that fc1 can be calculated.

Drop out

keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

Dropout means that some data is not used with the probability set by keep_prob. By doing so, you can prevent overfitting (a situation where the accuracy is high in the training data but not in the test data). I set the value of keep_prob during processing, but I am using 0.5 during training and 1.0 (without using dropout) for verification with test data.

Verification with test data

#print("test accuracy %g" %accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

It is a verification part with test data, but it is commented out. This is because cloud9 doesn't seem to have enough memory and an error occurs here. The dropout keep_prob described above is set to 1.0.

Execution of learning

You can understand the subsequent processing from the contents explained last time. When I performed the learning, cloud9 was very time consuming because it was free and had poor CPU and memory. It takes a few hours to half a day. Since the parameters are saved, the following data can be predicted immediately, but it was difficult when the learning had to be redone.

Predicting your handwritten data

When I tried to predict my handwritten data as before, it was 50%. Originally, the accuracy should increase, but on the contrary, it decreases. .. .. I don't know if the handwritten data is 1 and 0, or because there are only 10 handwritten data, I don't know if it happened to be off, but I feel like it's not good enough. ⇒ Preprocessing was required to predict handwritten data. For more information, please refer to the article Predicting your handwritten data with TensorFlow.

mnist_neural.py


from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import sys
import numpy as np
import parsebmp as pb

# Define method
def weight_variable(shape, name):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial, name=name)

def bias_variable(shape, name):
  initial= tf.constant(0.1, shape=shape)
  return tf.Variable(initial, name=name)

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

def whatisit(file, sess):
  print("File name is %s" % file)
  data = pb.parse_bmp(file)

  # Show bmp data
  for i in range(len(data)):
    sys.stdout.write(str(int(data[i])))
    if (i+1) % 28 == 0:
      print("")

  # Predicting
  d = np.array([data])
  x_image = tf.reshape(d, [-1, 28, 28, 1])
  h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
  h_pool1 = max_pool_2x2(h_conv1)
  h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
  h_pool2 = max_pool_2x2(h_conv2)
  h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
  h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
  h_fc1_drop = tf.nn.dropout(h_fc1, 1.0)
  y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
  result = sess.run(y_conv)

  # Show result
  print(result)
  print(np.argmax(result, 1))

if __name__ == "__main__":
  # Restore parameters
  W = tf.Variable(tf.zeros([28*28, 10]), name="W")
  b = tf.Variable(tf.zeros([10]), name="b")
  W_conv1 = weight_variable([5, 5, 1, 32], name="W_conv1")
  b_conv1 = bias_variable([32], name="b_conv1")
  W_conv2 = weight_variable([5, 5, 32, 64], name="W_conv2")
  b_conv2 = bias_variable([64], name="b_conv2")
  W_fc1 = weight_variable([7 * 7 * 64, 1024], name="W_fc1")
  b_fc1 = bias_variable([1024], name="b_fc1")
  W_fc2 = weight_variable([1024, 10], name="W_fc2")
  b_fc2 = bias_variable([10], name="b_fc2")

  sess = tf.InteractiveSession()
  saver = tf.train.Saver()
  saver.restore(sess, 'param/neural.param')

  # My data
  whatisit("My_data/0.bmp", sess)
  whatisit("My_data/1.bmp", sess)
  whatisit("My_data/2.bmp", sess)
  whatisit("My_data/3.bmp", sess)
  whatisit("My_data/4.bmp", sess)
  whatisit("My_data/5.bmp", sess)
  whatisit("My_data/6.bmp", sess)
  whatisit("My_data/7.bmp", sess)
  whatisit("My_data/8.bmp", sess)
  whatisit("My_data/9.bmp", sess)

in conclusion

At first I couldn't understand it at all, but I deepened my understanding by studying in various books and the Web and implementing code. Next, I would like to think about the network and implement it myself.

Change log

--2018/06/12: Added about prediction of handwritten data --2017/03/28: New post

Recommended Posts

I tried a convolutional neural network (CNN) with a tutorial on TensorFlow on Cloud9-Classification of handwritten images-
I tried a TensorFlow tutorial (MNIST for beginners) on Cloud9-Classification of handwritten images-
Introduction to AI creation with Python! Part 3 I tried to classify and predict images with a convolutional neural network (CNN)
Build a classifier with a handwriting recognition rate of 99.2% with a TensorFlow convolutional neural network
I ran the TensorFlow tutorial with comments (first neural network: the beginning of the classification problem)
I tried TensorFlow tutorial CNN 4th
Since I touched Tensorflow for 2 months, I explained the convolutional neural network in an easy-to-understand manner with 95.04% of "handwritten hiragana" identification.
I tried object detection with YOLO v3 (TensorFlow 2.0) on a windows CPU!
I tried object detection with YOLO v3 (TensorFlow 2.1) on the GPU of windows!
I tried the MNIST tutorial for beginners of tensorflow.
Implementation of a convolutional neural network using only Numpy
I tried "morphology conversion" of images with Python + OpenCV
I tried to make something like a chatbot with the Seq2Seq model of TensorFlow
I tried to implement a basic Recurrent Neural Network model
I tried refactoring the CNN model of TensorFlow using TF-Slim
Pytorch Neural Network (CNN) Tutorial 1.3.1.
I tried handwriting recognition of runes with CNN using Keras
I tried to predict the genre of music from the song title on the Recurrent Neural Network
I tried a stochastic simulation of a bingo game with Python
I made a neural network generator that runs on FPGA
CNN Acceleration Series ~ FCNN: Introduction of Fourier Convolutional Neural Network ~
I tried to embed a protein-protein interaction network in hyperbolic space with Poincarē embeding of gensim
I tried to create a list of prime numbers with python
I tried to automatically collect images of Kanna Hashimoto with Python! !!
I tried running the TensorFlow tutorial with comments (_TensorFlow_2_0_Introduction for beginners)
[Sentence classification] I tried various pooling methods of Convolutional Neural Networks
Understand the number of input / output parameters of a convolutional neural network
I tried to make a mechanism of exclusive control with Go
I tried the TensorFlow tutorial 1st
Implementation of a two-layer neural network 2
What is a Convolutional Neural Network?
I tried the TensorFlow tutorial 2nd
[TensorFlow] [Keras] Neural network construction with Keras
I implemented a two-layer neural network
I tried a formation flight of a small drone Tello with ESP32: DJI Tello drone formation flight
I tried how to improve the accuracy of my own Neural Network
Effects of image rotation, enlargement, color, etc. on convolutional neural networks (CNN)
I tried to draw a system configuration diagram with Diagrams on Docker
I made an image discrimination (cifar10) model using a convolutional neural network.
I tried the TensorFlow tutorial MNIST 3rd
Compose with a neural network! Run Magenta
I tried to implement Autoencoder with TensorFlow
I tried a functional language with Python
I built a TensorFlow environment on windows10
I tried CNN fine tuning with Resnet
I tried 3D detection of a car
Introduction to AI creation with Python! Part 2 I tried to predict the house price in Boston with a neural network
I tried a neural network Π-Net that does not require an activation function
I tried to make a simple image recognition API with Fast API and Tensorflow
I tried to create a model with the sample of Amazon SageMaker Autopilot
I made a demo that lets the model learned in the Tensorflow mnist tutorial distinguish the handwritten numbers written on the canvas.
A memo of a tutorial on running python on heroku
Shuffle hundreds of thousands of images evenly with tensorflow.
I tried hundreds of millions of SQLite with python
I tried image recognition of CIFAR-10 with Keras-Learning-
Experiment with various optimization algorithms with a neural network
Try Tensorflow with a GPU instance on AWS
I tried image recognition of CIFAR-10 with Keras-Image recognition-
Visualize the inner layer of a neural network
Verification of Batch Normalization with multi-layer neural network
I tried Flask with Remote-Containers of VS Code