There were few examples of use other than neural networks and deep learning in articles related to TensorFlow, so I implemented non-negative matrix factorization (NMF) as an example.

Development environment

OS: Ubuntu 14.04 CPU: Core i7-3770 3.40GHz×8 Memory: 16GB TersorFlow: ver.0.6.0 CPU mode Python version: 2.7

Non-negative Matrix Factorization

NMF is an algorithm that approximates the nonnegative matrix V by the product of two nonnegative matrices H and W. It is used in a wide range of fields such as feature extraction, dimension reduction, image processing as a clustering method, and natural language.

Problem setting

Input: $ V \ in \ mathcal {R} ^ {m \ times n} _ {+}, \ r \ in \ mathcal {N} ^ + $

output:$\underset{W,H} {\arg\min} ||V-WH||_{F}^{2}\ \ s.t.\ W \in \mathcal{R} _{+} ^{m \times r}, H \in \mathcal{R} _{+} ^ {r \times n} $

algorithm

This time, we will use the multiplicative update rule, which is the easiest to implement, as an optimization for NMF. The MU algorithm alternates between the following update expressions until they converge.

H_{a\mu} \leftarrow H_{a\mu}\frac{(W^TV)_{a\mu}}{(W^TWH)_{a\mu}},\ \ 
W_{ia} \leftarrow W_{ia}\frac{(VH^T)_{ia}}{(WHH^T)_{ia}}

Reference link: Algorithms for Non-negative Matrix Factorization

Implementation

The class body is implemented as follows.

`tfnmf.py`


# -*- coding: utf-8 -*-
from __future__ import division
import numpy as np
import tensorflow as tf

class TFNMF(object):
    """Non-negative Matrix Factorization by TensorFlow"""
    def __init__(self, V, rank):

        #Convert from Numpy matrix to TensorFlow Tensor
        V_ = tf.constant(V, dtype=tf.float32)
        shape = V.shape

        #Average sqrt(V.mean() / rank)Scale to a uniform random number
        scale = 2 * np.sqrt(V.mean() / rank)
        initializer = tf.random_uniform_initializer(maxval=scale)

        #Matrix H,Variable generation of W
        self.H = H = tf.get_variable("H", [rank, shape[1]],
                                     initializer=initializer)
        self.W = W = tf.get_variable(name="W", shape=[shape[0], rank],
                                     initializer=initializer)

        #Save W for convergence test
        W_old = tf.get_variable(name="W_old", shape=[shape[0], rank])
        self._save_W = W_old.assign(W)

        #MU algorithm
        #Update H
        Wt = tf.transpose(W)
        WV = tf.matmul(Wt, V_)
        WWH = tf.matmul(tf.matmul(Wt, W), H)
        WV_WWH = WV / WWH
        with tf.device('/cpu:0'):
            #Converts an element containing nan by division by zero to 0
            WV_WWH = tf.select(tf.is_nan(WV_WWH),
                              tf.zeros_like(WV_WWH),
                              WV_WWH)
        H_new = H * WV_WWH
        self._update_H = H.assign(H_new)

        #Update W(H has been updated)
        Ht = tf.transpose(H)
        VH = tf.matmul(V_, Ht)
        WHH = tf.matmul(W, tf.matmul(H, Ht))
        VH_WHH = VH / WHH
        with tf.device('/cpu:0'):
            #Converts an element containing nan by division by zero to 0
            WV_WWH = tf.select(tf.is_nan(WV_WWH),
                              tf.zeros_like(WV_WWH),
                              WV_WWH)
        W_new = W * VH_WHH
        self._update_W = W.assign(W_new)

        #Total change of each element of W
        self._delta = tf.reduce_sum(tf.abs(W_old - W))

    def run(self, sess, max_iter=200):
        tf.initialize_all_variables().run()
        for i in range(max_iter):
            sess.run(self._save_W)
            H, _ = sess.run([self.H, self._update_H])
            W, _ = sess.run([self.W, self._update_W])
            delta = sess.run(self._delta)
            if delta < 0.001:
                break
        return W, H

Execution time

Measure the execution time by changing the number of cores used by the CPU. The input is a 10000 x 10000 random matrix, and the rank number is 10.

The execution script is as follows.

`main.py`


# -*- coding: utf-8 -*-
from __future__ import print_function
import time
import numpy as np
import tensorflow as tf
from tfnmf import TFNMF

def main():
    V = np.random.rand(10000,10000)
    rank = 10
    num_core = 8

    tfnmf = TFNMF(V, rank)
    config = tf.ConfigProto(inter_op_parallelism_threads=num_core,
                           intra_op_parallelism_threads=num_core)
    with tf.Session(config=config) as sess:
        start = time.time()
        W, H = tfnmf.run(sess)
        print("Computational Time: ", time.time() - start)

    #Square error calculation
    W = np.mat(W)
    H = np.mat(H)
    error = np.power(V - W * H, 2).sum()
    print("Reconstruction Error: ", error)


if __name__ == '__main__':
    main()

This is the result when executed with 8 cores. Reconstruction Error represents the squared error, which is the objective function.

$python main.py
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 8
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 8
Computational Time:  45.2025268078
Reconstruction Error:  8321195.31013

It is a graph of execution time when the number of cores is changed from 1 to 8.

Eight cores are about 1.4 times faster than one core. The overhead was higher than I expected, and even if I increased the number of cores, the execution time did not increase dramatically. It may not be a very efficient method to do sess.run () every time W, H is updated.

in conclusion

This time, I used TensorFlow only for matrix operations, so I felt that it wasn't useful, but it's wonderful to be able to easily describe parallel operations. Next time, I will try to implement tensor decomposition.

I tried non-negative matrix factorization (NMF) with TensorFlow