There were few examples of use other than neural networks and deep learning in articles related to TensorFlow, so I implemented non-negative matrix factorization (NMF) as an example.
OS: Ubuntu 14.04 CPU: Core i7-3770 3.40GHz×8 Memory: 16GB TersorFlow: ver.0.6.0 CPU mode Python version: 2.7
NMF is an algorithm that approximates the nonnegative matrix V by the product of two nonnegative matrices H and W. It is used in a wide range of fields such as feature extraction, dimension reduction, image processing as a clustering method, and natural language.
Input: $ V \ in \ mathcal {R} ^ {m \ times n} _ {+}, \ r \ in \ mathcal {N} ^ + $
output:$\underset{W,H} {\arg\min} ||V-WH||_{F}^{2}\ \ s.t.\ W \in \mathcal{R} _{+} ^{m \times r}, H \in \mathcal{R} _{+} ^ {r \times n} $
This time, we will use the multiplicative update rule, which is the easiest to implement, as an optimization for NMF. The MU algorithm alternates between the following update expressions until they converge.
H_{a\mu} \leftarrow H_{a\mu}\frac{(W^TV)_{a\mu}}{(W^TWH)_{a\mu}},\ \
W_{ia} \leftarrow W_{ia}\frac{(VH^T)_{ia}}{(WHH^T)_{ia}}
Reference link: Algorithms for Non-negative Matrix Factorization
The class body is implemented as follows.
tfnmf.py
# -*- coding: utf-8 -*-
from __future__ import division
import numpy as np
import tensorflow as tf
class TFNMF(object):
"""Non-negative Matrix Factorization by TensorFlow"""
def __init__(self, V, rank):
#Convert from Numpy matrix to TensorFlow Tensor
V_ = tf.constant(V, dtype=tf.float32)
shape = V.shape
#Average sqrt(V.mean() / rank)Scale to a uniform random number
scale = 2 * np.sqrt(V.mean() / rank)
initializer = tf.random_uniform_initializer(maxval=scale)
#Matrix H,Variable generation of W
self.H = H = tf.get_variable("H", [rank, shape[1]],
initializer=initializer)
self.W = W = tf.get_variable(name="W", shape=[shape[0], rank],
initializer=initializer)
#Save W for convergence test
W_old = tf.get_variable(name="W_old", shape=[shape[0], rank])
self._save_W = W_old.assign(W)
#MU algorithm
#Update H
Wt = tf.transpose(W)
WV = tf.matmul(Wt, V_)
WWH = tf.matmul(tf.matmul(Wt, W), H)
WV_WWH = WV / WWH
with tf.device('/cpu:0'):
#Converts an element containing nan by division by zero to 0
WV_WWH = tf.select(tf.is_nan(WV_WWH),
tf.zeros_like(WV_WWH),
WV_WWH)
H_new = H * WV_WWH
self._update_H = H.assign(H_new)
#Update W(H has been updated)
Ht = tf.transpose(H)
VH = tf.matmul(V_, Ht)
WHH = tf.matmul(W, tf.matmul(H, Ht))
VH_WHH = VH / WHH
with tf.device('/cpu:0'):
#Converts an element containing nan by division by zero to 0
WV_WWH = tf.select(tf.is_nan(WV_WWH),
tf.zeros_like(WV_WWH),
WV_WWH)
W_new = W * VH_WHH
self._update_W = W.assign(W_new)
#Total change of each element of W
self._delta = tf.reduce_sum(tf.abs(W_old - W))
def run(self, sess, max_iter=200):
tf.initialize_all_variables().run()
for i in range(max_iter):
sess.run(self._save_W)
H, _ = sess.run([self.H, self._update_H])
W, _ = sess.run([self.W, self._update_W])
delta = sess.run(self._delta)
if delta < 0.001:
break
return W, H
Measure the execution time by changing the number of cores used by the CPU. The input is a 10000 x 10000 random matrix, and the rank number is 10.
The execution script is as follows.
main.py
# -*- coding: utf-8 -*-
from __future__ import print_function
import time
import numpy as np
import tensorflow as tf
from tfnmf import TFNMF
def main():
V = np.random.rand(10000,10000)
rank = 10
num_core = 8
tfnmf = TFNMF(V, rank)
config = tf.ConfigProto(inter_op_parallelism_threads=num_core,
intra_op_parallelism_threads=num_core)
with tf.Session(config=config) as sess:
start = time.time()
W, H = tfnmf.run(sess)
print("Computational Time: ", time.time() - start)
#Square error calculation
W = np.mat(W)
H = np.mat(H)
error = np.power(V - W * H, 2).sum()
print("Reconstruction Error: ", error)
if __name__ == '__main__':
main()
This is the result when executed with 8 cores. Reconstruction Error represents the squared error, which is the objective function.
$python main.py
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 8
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 8
Computational Time: 45.2025268078
Reconstruction Error: 8321195.31013
It is a graph of execution time when the number of cores is changed from 1 to 8.
Eight cores are about 1.4 times faster than one core. The overhead was higher than I expected, and even if I increased the number of cores, the execution time did not increase dramatically. It may not be a very efficient method to do sess.run () every time W, H is updated.
This time, I used TensorFlow only for matrix operations, so I felt that it wasn't useful, but it's wonderful to be able to easily describe parallel operations. Next time, I will try to implement tensor decomposition.
Recommended Posts