Originally, I was studying the framework "Theano" for deep learning, but recently I was interested in "TensorFlow" released last month (2015/11) and recently used it as the main. While touching TensorFlow, I felt that "this is close to" Theano ", this is quite different", but to confirm the difference between the two, I ported a simple code from "TensorFlow" to "Theano". (Usually, I think the direction is opposite.)
First, we compare the outlines of the two frameworks.
item | Theano | TensorFlow |
---|---|---|
Development entity | academic(University of Montreal) | Company(Google) |
Publication year | About 2010(?) | 2015 |
Tensor operation | support | support |
numpy 'basic'level function | support | support |
Automatic differentiation(Graph Transformation) | support | support |
GPU arithmetic | support | support |
Graph Visualization | support (There is for the time being) | support (You know Tensorboard) |
Optimizer | not support (built-Means not in) | Various support |
Neural Network functions | support | (various) support |
In terms of functionality, there are differences in the last part of the table above. In Theano, you have to prepare the details yourself, but in ThenorFlow, you have the impression that various library functions are prepared from the beginning. (For those who do not want to program from the details in Theano, there is a high-performance library based on Theano, such as Pylearn2.)
Here, you can write code in "Theano" and "ThensorFlow" in much the same way. This is an excerpt from the MLP (Multi-layer Perceptron) code that performs multi-class classification introduced in Previous article.
** TensorFlow version **
# Hidden Layer
class HiddenLayer(object):
def __init__(self, input, n_in, n_out):
self.input = input
w_h = tf.Variable(tf.random_normal([n_in,n_out],mean=0.0,stddev=0.05))
b_h = tf.Variable(tf.zeros([n_out]))
self.w = w_h
self.b = b_h
self.params = [self.w, self.b]
def output(self):
linarg = tf.matmul(self.input, self.w) + self.b
self.output = tf.nn.relu(linarg) # switch sigmoid() to relu()
return self.output
# Read-out Layer
class ReadOutLayer(object):
def __init__(self, input, n_in, n_out):
self.input = input
w_o = tf.Variable(tf.random_normal([n_in,n_out],mean=0.0,stddev=0.05))
b_o = tf.Variable(tf.zeros([n_out]))
self.w = w_o
self.b = b_o
self.params = [self.w, self.b]
def output(self):
linarg = tf.matmul(self.input, self.w) + self.b
self.output = tf.nn.softmax(linarg)
return self.output
** Theano version **
# Hidden Layer
class HiddenLayer(object):
def __init__(self, input, n_in, n_out):
self.input = input
w_h = theano.shared(floatX(np.random.standard_normal([n_in, n_out]))
* 0.05)
b_h = theano.shared(floatX(np.zeros(n_out)))
self.w = w_h
self.b = b_h
self.params = [self.w, self.b]
def output(self):
linarg = T.dot(self.input, self.w) + self.b
# self.output = T.nnet.relu(linarg)
self.output = T.nnet.sigmoid(linarg)
return self.output
# Read-out Layer
class ReadOutLayer(object):
def __init__(self, input, n_in, n_out):
self.input = input
w_o = theano.shared(floatX(np.random.standard_normal([n_in,n_out]))
* 0.05)
b_o = theano.shared(floatX(np.zeros(n_out)))
self.w = w_o
self.b = b_o
self.params = [self.w, self.b]
def output(self):
linarg = T.dot(self.input, self.w) + self.b
self.output = T.nnet.softmax(linarg)
return self.output
If you don't look closely, the difference is "just changed" tf. "To" T. "". Regarding the activation function, I thought that the later TenfsorFlow would be more convenient because of "tf.nn.relu ()" etc., but Theano also supports relu () (Rectified Linear Unit) from ver.0.7.1. It seems. (The programs used in this article are Python ver.2.7.10, TensorFlow ver.0.5.0, Theano ver.0.7.0.)
The Softmax function is, of course, supported by both.
Here, there is a difference between the two (TensorFlow vs. Theano). In TensorFlow, the Optimizer class library is extensive, while in Theano, you need to prepare the Optimizer class library yourself.
** TensorFlow version (Adagrad Optimzer usage example) **
# Train
optimizer = tf.train.AdagradOptimizer(0.01)
train_op = optimizer.minimize(loss)
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
print('Training...')
for i in range(10001):
train_op.run({x: train_x, y_: train_y})
if i % 1000 == 0: # echo status on screen
train_accuracy = accuracy.eval({x: train_x, y_: train_y})
print(' step, accurary = %6d: %8.3f' % (i, train_accuracy))
In TensorFlow, specify optimizer and run Session. The flow of supplying Train Data in the form of Feed_dict in Seesion.
** Theano version (example of implementing Adagrad) ** I intended to write how to call a function as close as possible to the TensorFlow code, but it became as follows.
# Optimizers - (GradientDescent), AdaGrad
class Optimizer(object):
def __init__(self, params, learning_rate=0.01):
self.lr = learning_rate
self.params = params
def minimize(self, loss):
self.gradparams = [T.grad(loss, param) for param in params]
class AdagradOptimizer(Optimizer):
def __init__(self, params, learning_rate=0.01, eps=1.e-6):
super(AdagradOptimizer, self).__init__(params, learning_rate)
self.eps = eps
self.accugrads = [theano.shared(floatX(np.zeros(t.shape.eval())),
'accugrad') for t in self.params
]
def minimize(self, loss):
super(AdagradOptimizer, self).minimize(loss)
self.updates = OrderedDict() # for Theano's rule
for accugrad, param, gparam in zip(
self.accugrads, self.params, self.gradparams):
agrad = accugrad + gparam * gparam
dx = - (self.lr / T.sqrt(agrad + self.eps)) * gparam
self.updates[param] = param + dx
self.updates[accugrad] = agrad
return self.updates
These are used to run the optimization process (learning).
# Train
myoptimizer = AdagradOptimizer(params, learning_rate=0.01, eps=1.e-8)
one_update = myoptimizer.minimize(loss)
# Compile ... define theano.function
train_model = theano.function(
inputs=[],
outputs=[loss, accuracy],
updates=one_update,
givens=[(x, strain_x), (y_, strain_y)],
allow_input_downcast=True
)
n_epochs = 10001
epoch = 0
while (epoch < n_epochs):
epoch += 1
loss, accu = train_model()
if epoch % 1000 == 0:
print('epoch[%5d] : cost =%8.4f, accyracy =%8.4f' % (epoch, loss, accu))
In this way, in the part of the optimization process, ** In TensorFlow ... ** After initializing the variables, run Session. Training data is given during the session in the form of op.run ({Feed_dict}). ** At Theano ... ** The flow of the learning process (including the supply of training data) is defined in theano.function (). Perform iterative learning calculations using the defined function (theano.function), The difference can be seen.
When I first started learning "Theano", I remember having trouble using this theano.function (), but by comparing it with "TensorFlow" as described above, I deepened my understanding of theano.function (). It was. (From a different point of view, if you have a good understanding of theano.function (), you will be able to use Theano well.)
Since it is a framework used for the same purpose, there are many similarities. In terms of functionality, TensorFlow is more complete, so it may be easier to port the Theno code for ThensorFlow. However, once the Optimizer and new functions are implemented, they can be reused, so Theano is not a disadvantage. (The code of the ancestor is also helpful, and there are many add-on libraries.)
I haven't tried more complex network models just by looking at the MLP code yet, but both seem to be very potential tools. (I also installed Chainer, but I couldn't get my hands on it ...)
When I checked the site about Neural Network Library "Keras" at the time of writing this article, it seems that in addition to the library used in combination with "Theano", the one used in combination with "TensorFlow" has been released. (I would like to find out later.)
Keras: Deep Learning library for Theano and TensorFlow
Recommended Posts