I tried porting the code written for TensorFlow to Theano

Originally, I was studying the framework "Theano" for deep learning, but recently I was interested in "TensorFlow" released last month (2015/11) and recently used it as the main. While touching TensorFlow, I felt that "this is close to" Theano ", this is quite different", but to confirm the difference between the two, I ported a simple code from "TensorFlow" to "Theano". (Usually, I think the direction is opposite.)

Summary: "Theano" vs. "TensorFlow"

First, we compare the outlines of the two frameworks.

item	Theano	TensorFlow
Development entity	academic(University of Montreal)	Company(Google)
Publication year	About 2010(?)	2015
Tensor operation	support	support
numpy 'basic'level function	support	support
Automatic differentiation(Graph Transformation)	support	support
GPU arithmetic	support	support
Graph Visualization	support (There is for the time being)	support (You know Tensorboard)
Optimizer	not support (built-Means not in)	Various support
Neural Network functions	support	(various) support

In terms of functionality, there are differences in the last part of the table above. In Theano, you have to prepare the details yourself, but in ThenorFlow, you have the impression that various library functions are prepared from the beginning. (For those who do not want to program from the details in Theano, there is a high-performance library based on Theano, such as Pylearn2.)

Compare by code (modeling Neural Network)

Here, you can write code in "Theano" and "ThensorFlow" in much the same way. This is an excerpt from the MLP (Multi-layer Perceptron) code that performs multi-class classification introduced in Previous article.

** TensorFlow version **

# Hidden Layer
class HiddenLayer(object):
    def __init__(self, input, n_in, n_out):
        self.input = input
    
        w_h = tf.Variable(tf.random_normal([n_in,n_out],mean=0.0,stddev=0.05))
        b_h = tf.Variable(tf.zeros([n_out]))
     
        self.w = w_h
        self.b = b_h
        self.params = [self.w, self.b]
    
    def output(self):
        linarg = tf.matmul(self.input, self.w) + self.b
        self.output = tf.nn.relu(linarg)  # switch sigmoid() to relu()
        
        return self.output

# Read-out Layer
class ReadOutLayer(object):
    def __init__(self, input, n_in, n_out):
        self.input = input
        
        w_o = tf.Variable(tf.random_normal([n_in,n_out],mean=0.0,stddev=0.05))
        b_o = tf.Variable(tf.zeros([n_out]))
       
        self.w = w_o
        self.b = b_o
        self.params = [self.w, self.b]
    
    def output(self):
        linarg = tf.matmul(self.input, self.w) + self.b
        self.output = tf.nn.softmax(linarg)  

        return self.output

** Theano version **

# Hidden Layer
class HiddenLayer(object):
    def __init__(self, input, n_in, n_out):
        self.input = input
    
        w_h = theano.shared(floatX(np.random.standard_normal([n_in, n_out])) 
                             * 0.05) 
        b_h = theano.shared(floatX(np.zeros(n_out)))
   
        self.w = w_h
        self.b = b_h
        self.params = [self.w, self.b]
    
    def output(self):
        linarg = T.dot(self.input, self.w) + self.b
        # self.output = T.nnet.relu(linarg)
        self.output = T.nnet.sigmoid(linarg)
        
        return self.output

# Read-out Layer
class ReadOutLayer(object):
    def __init__(self, input, n_in, n_out):
        self.input = input
        
        w_o = theano.shared(floatX(np.random.standard_normal([n_in,n_out]))
                             * 0.05)
        b_o = theano.shared(floatX(np.zeros(n_out)))
       
        self.w = w_o
        self.b = b_o
        self.params = [self.w, self.b]
    
    def output(self):
        linarg = T.dot(self.input, self.w) + self.b
        self.output = T.nnet.softmax(linarg)  

        return self.output

If you don't look closely, the difference is "just changed" tf. "To" T. "". Regarding the activation function, I thought that the later TenfsorFlow would be more convenient because of "tf.nn.relu ()" etc., but Theano also supports relu () (Rectified Linear Unit) from ver.0.7.1. It seems. (The programs used in this article are Python ver.2.7.10, TensorFlow ver.0.5.0, Theano ver.0.7.0.)

The Softmax function is, of course, supported by both.

Code comparison (optimization process)

Here, there is a difference between the two (TensorFlow vs. Theano). In TensorFlow, the Optimizer class library is extensive, while in Theano, you need to prepare the Optimizer class library yourself.

** TensorFlow version (Adagrad Optimzer usage example) **

    # Train
    optimizer = tf.train.AdagradOptimizer(0.01)
    train_op = optimizer.minimize(loss)
    
    init = tf.initialize_all_variables()

    with tf.Session() as sess:
        sess.run(init)
        print('Training...')
        for i in range(10001):
            train_op.run({x: train_x, y_: train_y})
            if i % 1000 == 0:                # echo status on screen
                train_accuracy = accuracy.eval({x: train_x, y_: train_y})
                print(' step, accurary = %6d: %8.3f' % (i, train_accuracy))

In TensorFlow, specify optimizer and run Session. The flow of supplying Train Data in the form of Feed_dict in Seesion.

** Theano version (example of implementing Adagrad) ** I intended to write how to call a function as close as possible to the TensorFlow code, but it became as follows.

# Optimizers - (GradientDescent), AdaGrad
class Optimizer(object):
    def __init__(self, params, learning_rate=0.01):
        self.lr = learning_rate
        self.params = params
       
    def minimize(self, loss):
        self.gradparams = [T.grad(loss, param) for param in params]
        
class AdagradOptimizer(Optimizer):
    def __init__(self, params, learning_rate=0.01, eps=1.e-6):
        super(AdagradOptimizer, self).__init__(params, learning_rate)
        self.eps = eps
        self.accugrads = [theano.shared(floatX(np.zeros(t.shape.eval())),
                          'accugrad') for t in self.params
                         ]
    def minimize(self, loss):
        super(AdagradOptimizer, self).minimize(loss)
        self.updates = OrderedDict()    # for Theano's rule

        for accugrad, param, gparam in zip(
                              self.accugrads, self.params, self.gradparams):
            agrad = accugrad + gparam * gparam
            dx = - (self.lr / T.sqrt(agrad + self.eps)) * gparam
            self.updates[param] = param + dx
            self.updates[accugrad] = agrad

        return self.updates

These are used to run the optimization process (learning).

    # Train
    myoptimizer = AdagradOptimizer(params, learning_rate=0.01, eps=1.e-8)
    one_update = myoptimizer.minimize(loss)
    
    # Compile ... define theano.function
    train_model = theano.function(
        inputs=[],
        outputs=[loss, accuracy],
        updates=one_update,
        givens=[(x, strain_x), (y_, strain_y)],
        allow_input_downcast=True
    )

    n_epochs = 10001
    epoch = 0
    
    while (epoch < n_epochs):
        epoch += 1
        loss, accu = train_model()
        if epoch % 1000 == 0:
            print('epoch[%5d] : cost =%8.4f, accyracy =%8.4f' % (epoch, loss, accu))

In this way, in the part of the optimization process, ** In TensorFlow ... ** After initializing the variables, run Session. Training data is given during the session in the form of op.run ({Feed_dict}). ** At Theano ... ** The flow of the learning process (including the supply of training data) is defined in theano.function (). Perform iterative learning calculations using the defined function (theano.function), The difference can be seen.

When I first started learning "Theano", I remember having trouble using this theano.function (), but by comparing it with "TensorFlow" as described above, I deepened my understanding of theano.function (). It was. (From a different point of view, if you have a good understanding of theano.function (), you will be able to use Theano well.)

Summary and impressions

Since it is a framework used for the same purpose, there are many similarities. In terms of functionality, TensorFlow is more complete, so it may be easier to port the Theno code for ThensorFlow. However, once the Optimizer and new functions are implemented, they can be reused, so Theano is not a disadvantage. (The code of the ancestor is also helpful, and there are many add-on libraries.)

I haven't tried more complex network models just by looking at the MLP code yet, but both seem to be very potential tools. (I also installed Chainer, but I couldn't get my hands on it ...)

(Addition) Information about Keras

When I checked the site about Neural Network Library "Keras" at the time of writing this article, it seems that in addition to the library used in combination with "Theano", the one used in combination with "TensorFlow" has been released. (I would like to find out later.)

Keras: Deep Learning library for Theano and TensorFlow

References (web site)

TensorFlow Documentation : https://www.tensorflow.org/
Theano Documentation : http://deeplearning.net/software/theano/index.html
gradient-optimizers 0.0.4 : https://pypi.python.org/pypi/gradient-optimizers/0.0.4 --Try various learning coefficient optimizations with neural networks-Qiita http://qiita.com/hogefugabar/items/1d4f6c905d0edbc71af2
--Classify "Wine" with TensorFlow MLP code --Qiita http://qiita.com/TomokIshii/items/2cab778a3192d561a1ef
Keras Documentation : http://keras.io/