Introduction

In recent years, attention has been focused on deep learning (deep neural network: DNN) in the field of machine learning. Along with that, various DNN libraries have been released, but what are the differences between them (which one should be used)? In this article, I would like to compare how the network structure of DNN is described in each library. This article is intended for those who know DNN to some extent but do not know about libraries. If you've read the tutorials for each library and have actually written the code, there's nothing more.

Library

There are many libraries available, but this article deals with chainer and TensorFlow. I hope to add other libraries in the future. Below, I will first summarize only the simple basic information of each library. For comparison of speed etc. [I compared the library of Deep Learning that can be written in Python] [] was easy to understand, thank you. I wonder if this site is enough, but in this article, I would like to specifically arrange the source code and compare the descriptions of the same network structure.

TensorFlow

Developer: google
Official Tutorial: https://www.tensorflow.org/versions/master/tutorials/index.html
Number of github stars: 19830

This is an open library that google actually uses internally. You're attracted just by google (?

chainer

Developer: Preferred Networks (PFN)
Official Tutorial: http://docs.chainer.org/en/stable/tutorial/index.html
github Stars: 1240

PEN is a Japanese startup and seems to be funded by NTT. Expectations for Japanese documents (?

Library comparison

I just summarized it with reference to the tutorial ... Since I am a beginner of DNN, there may be some mistakes in terms.

Network structure / data set

In this paper, we use a 3-layer MLP network structure. I really wanted to do CNN, but that's the next time. The number of hidden layer units is 100. Input layer --Hidden layer --Output layer

MNIST handwritten numbers are used as the data set. It seems that it is already a simple problem for DNN, but this time I mainly want to compare the description method. The input layer is 784 dimensions (28 x 28) and the output is 10 dimensions.

I used data.py used in the chainer sample to get the data. In TensorFlow, a sample called ʻinput_data.py`` is often used, but I used the data read from `` data.py`` for study purposes. (In `` data.py``, the label is represented by a number and represented by a vector such as `` {1,4,9,2 ...} ``. On the other hand, in ʻinput_data.py``,It is expressed by a set of One-Hot vectors such as{0,0, ..., 1,0} . At first, I did not notice this difference and got an error that the dimension is wrong. Below, dense_to_one_hot ( The conversion operation is represented by a method called x).

`trainingData`


import data
import numpy as np

mnist = data.load_mnist_data()
x_all = mnist['data'].astype(np.float32) / 255
y_all = mnist['target'].astype(np.int32)

#only tensorFlow
y_all = dense_to_one_hot(y_all)

x_train, x_test = np.split(x_all, [60000])
y_train, y_test = np.split(y_all, [60000])

x_train.shape => (60000, 784) y_train.shape => (60000, 1) or (60000, 10)

It is.

Environment

This time, ubuntu14.04 (CPU-only) was used. Therefore, both libraries could be easily installed with pip. In windows ... I won't touch it this time.

Network description

chainer

`chainer(classDefine)`


import chainer
import chainer.functions as F
import chainer.links as L

class MLP(chainer.Chain):
    def __init__(self):
        super(MLP, self).__init__(
                                  l1=L.Linear(784, 100),
                                  l2=L.Linear(100, 100),
                                  l3=L.Linear(100, 10),
                                  )

    def __call__(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        y = self.l3(h2)
        return y



class Classifier(Chain):
    def __init__(self, predictor):
        super(Classifier, self).__init__(predictor=predictor)

    def __call__(self, x, t):
        y = self.predictor(x)
        self.loss = F.softmax_cross_entropy(y, t)
        self.accuracy = F.accuracy(y, t)
        return self.loss

First, create a class that defines the layer structure and a class that defines the output error. In the MLP class, Linear (full binding represented by weight $ W $ and bias $ b $) is used for each layer, and ReLU (...) is used as the activation function. You can see that there is. By changing this part, it seems possible to add dropout (...) or add a convolution layer.

In addition, softmax_cross_entropy (...), that is, the calculation of the cross entropy for the output of the softmax function, is used to calculate the error. By the way, in the defined Classifier class, a similar class is implemented as chainer.links.Classifier. If you can implement it as it is, you can use it.

`chainer(model)`


        model = Classifier(MLP()) # same as ``L.Classifier(MLP())``
        optimizer = optimizers.SGD()
        optimizer.setup(model)

Next, create an instance of the defined class. The optimization method is specified here, and SGD () (stochastic steepest descent method) is specified.

tensorFlow

`tensorFlow`


import tensorFlow as tf

        # input
            x = tf.placeholder(tf.float32, [None, 784])
        # label
            y_ = tf.placeholder(tf.float32, [None, 10])


        # FC1
            W1 = tf.Variable(tf.random_normal([784, 100], mean=0.0, stddev=0.05))
            b1 = tf.Variable(tf.zeros([100]))
            # layer output
            h1 = tf.nn.relu(tf.matmul(x, W1) + b1)

        # FC2
            W2 = tf.Variable(tf.random_normal([100, 100], mean=0.0, stddev=0.05))
            b2 = tf.Variable(tf.zeros([100]))
            # layer output
            h2 = tf.nn.relu(tf.matmul(h1, W2) + b2)

        # FC3
            W3 = tf.Variable(tf.random_normal([100, 10], mean=0.0, stddev=0.05))
            b3 = tf.Variable(tf.zeros([10]))
            # output
            y = tf.nn.softmax(tf.matmul(h2, W3) + b3)

        # training
            cross_entropy = -tf.reduce_sum(y_ * tf.log(y))
            train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

In tensorFlow, each layer is represented by the matrix formula $ y = Wx + b $ using the weight $ W $ and the bias $ b $ ($ x $: input, $ y $: output). The input $ x $ is now in the matrix (number of data, 784). The number of data is variable by specifying None in the code. The weight W1 of the first layer is defined as (784,100), the bias b1 is defined as (100,), and the output of the first layer is h1`. Define the formula for `. Here, relu (...) is used as the activation function. You can also see that the output layer calculates softmax (...) . y_is a variable that stores the correct label and is used to define the error cross_entropy. Cross entropy is used as the error, and Minimizeis specified using GradientDescentOptimizer``.

Compared to chainer, the description of each layer seems to be troublesome, but when I thought about modifying the network structure, I thought that it would be easier to do because the description is the same as the formula. (The code is complicated because I don't define the class properly.

Learning

Let batch_x, batch_y be a mini-batch of x_train, y_train, respectively. The following is the processing for one mini-batch.

chainer

`chainerTraining`


optimizer.update(model, batch_x, batch_y)

Pass it as an argument of optimizer.update.

tensorFlow

`tensorFlowTraining`


#sess = tf.Session()
sess.run(train_step, feed_dict={x:batch_x, y_:batch_y})

In tensorFlow, data is passed in the form of Dictionary as a set with placeholder defined above.

Comparison of speed and accuracy

This time, we will run it on a weak desktop PC with only a CPU, so we have confirmed that the code works in epoch about several tens of times. Since it is a widely used library, I think it is comparable in terms of accuracy and speed. I will verify it if there is an opportunity in the future.

in conclusion

As future issues, I would like to tackle the following.

Complex network structures such as CNNs and RNNs
GPU
Distribution status of trained models

There seems to be a difference in the library around the top two. I would like to try support for multiple GPUs (the machine for that is ... Also, I feel that caffe etc. are often used only for the purpose of classifying using a trained model (isn't there many?), But I'm also wondering if it can be used for that purpose.

reference

A laid-back engineer's diary-"[Comparing Deep Learning libraries that can be written in Python] []"

DNN (Deep Learning) Library: Comparison of chainer and TensorFlow (1)

Introduction

Library

TensorFlow

chainer

Library comparison

Network structure / data set

trainingData

Environment

Network description

chainer(classDefine)

chainer(model)

tensorFlow

Learning

chainerTraining

tensorFlowTraining

Comparison of speed and accuracy

in conclusion

reference

`trainingData`

`chainer(classDefine)`

`chainer(model)`

`tensorFlow`

`chainerTraining`

`tensorFlowTraining`