DNN (Deep Learning) Library: Comparison of chainer and TensorFlow (1)

Introduction

In recent years, attention has been focused on deep learning (deep neural network: DNN) in the field of machine learning. Along with that, various DNN libraries have been released, but what are the differences between them (which one should be used)? In this article, I would like to compare how the network structure of DNN is described in each library. This article is intended for those who know DNN to some extent but do not know about libraries. If you've read the tutorials for each library and have actually written the code, there's nothing more.

Library

There are many libraries available, but this article deals with chainer and TensorFlow. I hope to add other libraries in the future. Below, I will first summarize only the simple basic information of each library. For comparison of speed etc. [I compared the library of Deep Learning that can be written in Python] [] was easy to understand, thank you. I wonder if this site is enough, but in this article, I would like to specifically arrange the source code and compare the descriptions of the same network structure.

TensorFlow

This is an open library that google actually uses internally. You're attracted just by google (?

chainer

PEN is a Japanese startup and seems to be funded by NTT. Expectations for Japanese documents (?

Library comparison

I just summarized it with reference to the tutorial ... Since I am a beginner of DNN, there may be some mistakes in terms.

Network structure / data set

In this paper, we use a 3-layer MLP network structure. I really wanted to do CNN, but that's the next time. The number of hidden layer units is 100. Input layer --Hidden layer --Output layer

MNIST handwritten numbers are used as the data set. It seems that it is already a simple problem for DNN, but this time I mainly want to compare the description method. The input layer is 784 dimensions (28 x 28) and the output is 10 dimensions.

I used data.py used in the chainer sample to get the data. In TensorFlow, a sample called ʻinput_data.py`` is often used, but I used the data read from `` data.py`` for study purposes. (In `` data.py``, the label is represented by a number and represented by a vector such as `` {1,4,9,2 ...} ``. On the other hand, in ʻinput_data.py``,It is expressed by a set of One-Hot vectors such as{0,0, ..., 1,0} . At first, I did not notice this difference and got an error that the dimension is wrong. Below, dense_to_one_hot ( The conversion operation is represented by a method called x).

trainingData


import data
import numpy as np

mnist = data.load_mnist_data()
x_all = mnist['data'].astype(np.float32) / 255
y_all = mnist['target'].astype(np.int32)

#only tensorFlow
y_all = dense_to_one_hot(y_all)

x_train, x_test = np.split(x_all, [60000])
y_train, y_test = np.split(y_all, [60000])

x_train.shape => (60000, 784) y_train.shape => (60000, 1) or (60000, 10)

It is.

Environment

This time, ubuntu14.04 (CPU-only) was used. Therefore, both libraries could be easily installed with pip. In windows ... I won't touch it this time.

Network description

chainer

chainer(classDefine)


import chainer
import chainer.functions as F
import chainer.links as L

class MLP(chainer.Chain):
    def __init__(self):
        super(MLP, self).__init__(
                                  l1=L.Linear(784, 100),
                                  l2=L.Linear(100, 100),
                                  l3=L.Linear(100, 10),
                                  )

    def __call__(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        y = self.l3(h2)
        return y



class Classifier(Chain):
    def __init__(self, predictor):
        super(Classifier, self).__init__(predictor=predictor)

    def __call__(self, x, t):
        y = self.predictor(x)
        self.loss = F.softmax_cross_entropy(y, t)
        self.accuracy = F.accuracy(y, t)
        return self.loss

First, create a class that defines the layer structure and a class that defines the output error. In the MLP class, Linear (full binding represented by weight $ W $ and bias $ b $) is used for each layer, and ReLU (...) is used as the activation function. You can see that there is. By changing this part, it seems possible to add dropout (...) or add a convolution layer.

In addition, softmax_cross_entropy (...), that is, the calculation of the cross entropy for the output of the softmax function, is used to calculate the error. By the way, in the defined Classifier class, a similar class is implemented as chainer.links.Classifier. If you can implement it as it is, you can use it.

chainer(model)


        model = Classifier(MLP()) # same as ``L.Classifier(MLP())``
        optimizer = optimizers.SGD()
        optimizer.setup(model)

Next, create an instance of the defined class. The optimization method is specified here, and SGD () (stochastic steepest descent method) is specified.

tensorFlow

tensorFlow


import tensorFlow as tf

        # input
            x = tf.placeholder(tf.float32, [None, 784])
        # label
            y_ = tf.placeholder(tf.float32, [None, 10])


        # FC1
            W1 = tf.Variable(tf.random_normal([784, 100], mean=0.0, stddev=0.05))
            b1 = tf.Variable(tf.zeros([100]))
            # layer output
            h1 = tf.nn.relu(tf.matmul(x, W1) + b1)

        # FC2
            W2 = tf.Variable(tf.random_normal([100, 100], mean=0.0, stddev=0.05))
            b2 = tf.Variable(tf.zeros([100]))
            # layer output
            h2 = tf.nn.relu(tf.matmul(h1, W2) + b2)

        # FC3
            W3 = tf.Variable(tf.random_normal([100, 10], mean=0.0, stddev=0.05))
            b3 = tf.Variable(tf.zeros([10]))
            # output
            y = tf.nn.softmax(tf.matmul(h2, W3) + b3)

        # training
            cross_entropy = -tf.reduce_sum(y_ * tf.log(y))
            train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

In tensorFlow, each layer is represented by the matrix formula $ y = Wx + b $ using the weight $ W $ and the bias $ b $ ($ x $: input, $ y $: output). The input $ x $ is now in the matrix (number of data, 784). The number of data is variable by specifying None in the code. The weight W1 of the first layer is defined as (784,100), the bias b1 is defined as (100,), and the output of the first layer is h1`. Define the formula for `. Here, relu (...) is used as the activation function. You can also see that the output layer calculates softmax (...) . y_is a variable that stores the correct label and is used to define the error cross_entropy. Cross entropy is used as the error, and Minimizeis specified using GradientDescentOptimizer``.

Compared to chainer, the description of each layer seems to be troublesome, but when I thought about modifying the network structure, I thought that it would be easier to do because the description is the same as the formula. (The code is complicated because I don't define the class properly.

Learning

Let batch_x, batch_y be a mini-batch of x_train, y_train, respectively. The following is the processing for one mini-batch.

chainer

chainerTraining


optimizer.update(model, batch_x, batch_y)

Pass it as an argument of optimizer.update.

tensorFlow

tensorFlowTraining


#sess = tf.Session()
sess.run(train_step, feed_dict={x:batch_x, y_:batch_y})

In tensorFlow, data is passed in the form of Dictionary as a set with placeholder defined above.

Comparison of speed and accuracy

This time, we will run it on a weak desktop PC with only a CPU, so we have confirmed that the code works in epoch about several tens of times. Since it is a widely used library, I think it is comparable in terms of accuracy and speed. I will verify it if there is an opportunity in the future.

in conclusion

As future issues, I would like to tackle the following.

There seems to be a difference in the library around the top two. I would like to try support for multiple GPUs (the machine for that is ... Also, I feel that caffe etc. are often used only for the purpose of classifying using a trained model (isn't there many?), But I'm also wondering if it can be used for that purpose.

reference

Recommended Posts

DNN (Deep Learning) Library: Comparison of chainer and TensorFlow (1)
I installed and used the Deep Learning library Chainer
(python) Deep Learning Library Chainer Basics Basics
Meaning of deep learning models and parameters
A memorandum of studying and implementing deep learning
Parallel learning of deep learning by Keras and Kubernetes
Deep learning 1 Practice of deep learning
Installation of TensorFlow, a machine learning library from Google
Collection and automation of erotic images using deep learning
DEEP PROBABILISTIC PROGRAMMING --- "Deep Learning + Bayes" Library --- Introduction of Edward
Try deep learning with TensorFlow
Convenient library of Tensorflow TF-Slim
Comparison of Apex and Lamvery
Deep reinforcement learning 2 Implementation of reinforcement learning
Examination of Forecasting Method Using Deep Learning and Wavelet Transform-Part 2-
Deep Learning Model Lightening Library Distiller
Significance of machine learning and mini-batch learning
Environment construction of Tensorflow and Chainer by Window with CUDA (with GPU)
Try deep learning with TensorFlow Part 2
About Deep Learning (DNN) Project Management
Organize machine learning and deep learning platforms
Microsoft's Deep Learning Library "CNTK" Tutorial
Summary of pages useful for studying the deep learning framework Chainer
Graph of the history of the number of layers of deep learning and the change in accuracy
I tried using the trained model VGG16 of the deep learning library Keras
Examination of exchange rate forecasting method using deep learning and wavelet transform
Classify anime faces with deep learning with Chainer
Othello-From the tic-tac-toe of "Implementation Deep Learning" (3)
Introduction to Deep Learning ~ Convolution and Pooling ~
pix2 pix tensorflow2 Record of trial and error
Comparison of gem, bundler and pip, venv
Stock price forecast using deep learning (TensorFlow)
Try with Chainer Deep Q Learning --Launch
Comparison of class inheritance and constructor description
Sentiment analysis of tweets with deep learning
Comparison of L1 regularization and Leaky Relu
Learning record of reading "Deep Learning from scratch"
Speed comparison of murmurhash3, md5 and sha1
Othello-From the tic-tac-toe of "Implementation Deep Learning" (2)
[For beginners of artificial intelligence] Machine learning / Deep Learning Programming Learning path and reference books
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
Build a python environment to learn the theory and implementation of deep learning
Deep Learning
How to install the deep learning framework Tensorflow 1.0 in the Anaconda environment of Windows
Visualization of CNN feature maps and filters (Tensorflow 2.0)
The story of doing deep learning with TPU
Deep learning / error back propagation of sigmoid function
Numerai Tournament-Fusion of Traditional Quants and Machine Learning-
[Learning memo] Deep Learning from scratch ~ Implementation of Dropout ~
Python & Machine Learning Study Memo ②: Introduction of Library
Comparison of k-means implementation examples of scikit-learn and pyclustering
Recipe collection comparing versions 1 and 2 of TensorFlow (Part 1)
Basic understanding of stereo depth estimation (Deep Learning)
Extend and inflate your own Deep Learning dataset
Stock Price Forecast Using Deep Learning (TensorFlow) -Part 2-
Comparison of R and Python writing (Euclidean algorithm)
Implementation of Deep Learning model for image recognition
Deep learning learned by implementation (segmentation) ~ Implementation of SegNet ~
About shallow and deep copies of Python / Ruby
Install the machine learning library TensorFlow on fedora23
I installed Chainer, a framework for deep learning