In recent years, attention has been focused on deep learning (deep neural network: DNN) in the field of machine learning. Along with that, various DNN libraries have been released, but what are the differences between them (which one should be used)? In this article, I would like to compare how the network structure of DNN is described in each library. This article is intended for those who know DNN to some extent but do not know about libraries. If you've read the tutorials for each library and have actually written the code, there's nothing more.
There are many libraries available, but this article deals with chainer and TensorFlow. I hope to add other libraries in the future. Below, I will first summarize only the simple basic information of each library. For comparison of speed etc. [I compared the library of Deep Learning that can be written in Python] [] was easy to understand, thank you. I wonder if this site is enough, but in this article, I would like to specifically arrange the source code and compare the descriptions of the same network structure.
This is an open library that google actually uses internally. You're attracted just by google (?
PEN is a Japanese startup and seems to be funded by NTT. Expectations for Japanese documents (?
I just summarized it with reference to the tutorial ... Since I am a beginner of DNN, there may be some mistakes in terms.
In this paper, we use a 3-layer MLP network structure. I really wanted to do CNN, but that's the next time. The number of hidden layer units is 100.
Input layer --Hidden layer --Output layer
MNIST handwritten numbers are used as the data set. It seems that it is already a simple problem for DNN, but this time I mainly want to compare the description method. The input layer is 784 dimensions (28 x 28) and the output is 10 dimensions.
I used data.py
used in the chainer sample to get the data. In TensorFlow, a sample called ʻinput_data.py`` is often used, but I used the data read from `` data.py`` for study purposes. (In `` data.py``, the label is represented by a number and represented by a vector such as `` {1,4,9,2 ...} ``. On the other hand, in
ʻinput_data.py``,It is expressed by a set of One-Hot vectors such as
{0,0, ..., 1,0}
. At first, I did not notice this difference and got an error that the dimension is wrong. Below, dense_to_one_hot ( The conversion operation is represented by a method called x).
trainingData
import data
import numpy as np
mnist = data.load_mnist_data()
x_all = mnist['data'].astype(np.float32) / 255
y_all = mnist['target'].astype(np.int32)
#only tensorFlow
y_all = dense_to_one_hot(y_all)
x_train, x_test = np.split(x_all, [60000])
y_train, y_test = np.split(y_all, [60000])
x_train.shape => (60000, 784)
y_train.shape => (60000, 1) or (60000, 10)
It is.
This time, ubuntu14.04 (CPU-only) was used. Therefore, both libraries could be easily installed with pip
. In windows ... I won't touch it this time.
chainer
chainer(classDefine)
import chainer
import chainer.functions as F
import chainer.links as L
class MLP(chainer.Chain):
def __init__(self):
super(MLP, self).__init__(
l1=L.Linear(784, 100),
l2=L.Linear(100, 100),
l3=L.Linear(100, 10),
)
def __call__(self, x):
h1 = F.relu(self.l1(x))
h2 = F.relu(self.l2(h1))
y = self.l3(h2)
return y
class Classifier(Chain):
def __init__(self, predictor):
super(Classifier, self).__init__(predictor=predictor)
def __call__(self, x, t):
y = self.predictor(x)
self.loss = F.softmax_cross_entropy(y, t)
self.accuracy = F.accuracy(y, t)
return self.loss
First, create a class that defines the layer structure and a class that defines the output error.
In the MLP
class, Linear
(full binding represented by weight $ W $ and bias $ b $) is used for each layer, and ReLU (...)
is used as the activation function. You can see that there is. By changing this part, it seems possible to add dropout (...)
or add a convolution layer.
In addition, softmax_cross_entropy (...)
, that is, the calculation of the cross entropy for the output of the softmax function, is used to calculate the error. By the way, in the defined Classifier class, a similar class is implemented as chainer.links.Classifier. If you can implement it as it is, you can use it.
chainer(model)
model = Classifier(MLP()) # same as ``L.Classifier(MLP())``
optimizer = optimizers.SGD()
optimizer.setup(model)
Next, create an instance of the defined class. The optimization method is specified here, and SGD ()
(stochastic steepest descent method) is specified.
tensorFlow
tensorFlow
import tensorFlow as tf
# input
x = tf.placeholder(tf.float32, [None, 784])
# label
y_ = tf.placeholder(tf.float32, [None, 10])
# FC1
W1 = tf.Variable(tf.random_normal([784, 100], mean=0.0, stddev=0.05))
b1 = tf.Variable(tf.zeros([100]))
# layer output
h1 = tf.nn.relu(tf.matmul(x, W1) + b1)
# FC2
W2 = tf.Variable(tf.random_normal([100, 100], mean=0.0, stddev=0.05))
b2 = tf.Variable(tf.zeros([100]))
# layer output
h2 = tf.nn.relu(tf.matmul(h1, W2) + b2)
# FC3
W3 = tf.Variable(tf.random_normal([100, 10], mean=0.0, stddev=0.05))
b3 = tf.Variable(tf.zeros([10]))
# output
y = tf.nn.softmax(tf.matmul(h2, W3) + b3)
# training
cross_entropy = -tf.reduce_sum(y_ * tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
In tensorFlow, each layer is represented by the matrix formula $ y = Wx + b $ using the weight $ W $ and the bias $ b $ ($ x $: input, $ y $: output).
The input $ x $ is now in the matrix (number of data, 784)
. The number of data is variable by specifying None
in the code.
The weight W1
of the first layer is defined as (784,100)
, the bias b1
is defined as (100,)
, and the output of the first layer is h1`. Define the formula for `. Here,
relu (...) is used as the activation function. You can also see that the output layer calculates
softmax (...) .
y_is a variable that stores the correct label and is used to define the error
cross_entropy. Cross entropy is used as the error, and
Minimizeis specified using
GradientDescentOptimizer``.
Compared to chainer, the description of each layer seems to be troublesome, but when I thought about modifying the network structure, I thought that it would be easier to do because the description is the same as the formula. (The code is complicated because I don't define the class properly.
Let batch_x, batch_y
be a mini-batch of x_train, y_train
, respectively. The following is the processing for one mini-batch.
chainer
chainerTraining
optimizer.update(model, batch_x, batch_y)
Pass it as an argument of optimizer.update.
tensorFlow
tensorFlowTraining
#sess = tf.Session()
sess.run(train_step, feed_dict={x:batch_x, y_:batch_y})
In tensorFlow, data is passed in the form of Dictionary as a set with placeholder
defined above.
This time, we will run it on a weak desktop PC with only a CPU, so we have confirmed that the code works in epoch about several tens of times. Since it is a widely used library, I think it is comparable in terms of accuracy and speed. I will verify it if there is an opportunity in the future.
As future issues, I would like to tackle the following.
There seems to be a difference in the library around the top two. I would like to try support for multiple GPUs (the machine for that is ... Also, I feel that caffe etc. are often used only for the purpose of classifying using a trained model (isn't there many?), But I'm also wondering if it can be used for that purpose.
Recommended Posts