Before we get into the detailed code description, I'll just show you what you can do with it .![Screenshot from 2016-11-23 15_44_07.png](https://qiita-image-store.s3.amazonaws. com / 0/134368 / b426cc2e-e963-0897-f2d0-f83a5ec7a3d0.png)
Handwritten 9 entered from the camera on the right like this The left is the estimation result from the trained CNN, and 9 is seen and 9 is returned properly. This time I wrote the code that allows CNN like this to run on ROS. Please refer to Previous Qiita article for details on execution method and preparation.
If you review the outline of the process
It's like that.
The whole code looks like this:
tensorflow_in_ros_mnist.py
import rospy
from sensor_msgs.msg import Image
from std_msgs.msg import Int16
from cv_bridge import CvBridge
import cv2
import numpy as np
import tensorflow as tf
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1],
padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
def makeCNN(x,keep_prob):
# --- define CNN model
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([3, 3, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
return y_conv
class RosTensorFlow():
def __init__(self):
self._cv_bridge = CvBridge()
self._sub = rospy.Subscriber('image', Image, self.callback, queue_size=1)
self._pub = rospy.Publisher('result', Int16, queue_size=1)
self.x = tf.placeholder(tf.float32, [None,28,28,1], name="x")
self.keep_prob = tf.placeholder("float")
self.y_conv = makeCNN(self.x,self.keep_prob)
self._saver = tf.train.Saver()
self._session = tf.InteractiveSession()
init_op = tf.initialize_all_variables()
self._session.run(init_op)
self._saver.restore(self._session, "model.ckpt")
def callback(self, image_msg):
cv_image = self._cv_bridge.imgmsg_to_cv2(image_msg, "bgr8")
cv_image_gray = cv2.cvtColor(cv_image, cv2.COLOR_RGB2GRAY)
ret,cv_image_binary = cv2.threshold(cv_image_gray,128,255,cv2.THRESH_BINARY_INV)
cv_image_28 = cv2.resize(cv_image_binary,(28,28))
np_image = np.reshape(cv_image_28,(1,28,28,1))
predict_num = self._session.run(self.y_conv, feed_dict={self.x:np_image,self.keep_prob:1.0})
answer = np.argmax(predict_num,1)
rospy.loginfo('%d' % answer)
self._pub.publish(answer)
def main(self):
rospy.spin()
if __name__ == '__main__':
rospy.init_node('rostensorflow')
tensor = RosTensorFlow()
tensor.main()
** import part **
tensorflow_in_ros_mnist.py
import rospy
from sensor_msgs.msg import Image
from std_msgs.msg import Int16
from cv_bridge import CvBridge
import cv2
import numpy as np
import tensorflow as tf
This time I wanted to set up a ROS node in Python, so I added rospy. I also included Image for reading images, Int16 for exporting, cv_bridge for passing ROS message files to OpenCV, OpenCV, Numpy, and Tensorflow for machine learning.
** CNN definition part **
tensorflow_in_ros_mnist.py
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1],
padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
def makeCNN(x,keep_prob):
# --- define CNN model
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([3, 3, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
return y_conv
I made a function that defines CNN with the configuration according to Tensorflow Tutorial (Deep MNIST for Expert) .. The model is that after performing convolution pooling twice, a fully connected layer is inserted and the probability for each number is calculated.
** First half of init part of class **
tensorflow_in_ros_mnist.py
class RosTensorFlow():
def __init__(self):
self._cv_bridge = CvBridge()
self._sub = rospy.Subscriber('image', Image, self.callback, queue_size=1)
self._pub = rospy.Publisher('result', Int16, queue_size=1)
The first half is ROS processing. Here, the CvBridge function is called, Subscriber, and Publisher are defined. This time, the Subscriber receives the Image type message, and the Publisher delivers the Int16 type message. Basically as in the ROS example. By including callback in the argument of the sub part, the callback function is called every time an Image message is received.
** Second half of init part of class **
tensorflow_in_ros_mnist.py
self.x = tf.placeholder(tf.float32, [None,28,28,1], name="x")
self.keep_prob = tf.placeholder("float")
self.y_conv = makeCNN(self.x,self.keep_prob)
self._saver = tf.train.Saver()
self._session = tf.InteractiveSession()
init_op = tf.initialize_all_variables()
self._session.run(init_op)
self._saver.restore(self._session, "model.ckpt")
The second half is Tensorflow processing. First, we define x, which is the placeholder that contains the image, and keep_prob, which is the placeholder that contains the DropOut rate. A placeholder is like an entrance to data, and data enters here more and more at runtime.
Next, define the CNN used this time as y_conv. Imagewise, it defines the path that data comes out from the data entrance of x, keep_prob through CNN to the exit named y_conv.
After defining the route, prepare to actually flow the data with the session function. Initialize the CNN weight W and bias b once with the tf.initialize_all_variables function.
Read the learned parameters here. To read the parameters, you need to do saver.restore after using the tf.train.Saver function.
** callback part **
tensorflow_in_ros_mnist.py
def callback(self, image_msg):
cv_image = self._cv_bridge.imgmsg_to_cv2(image_msg, "bgr8")
cv_image_gray = cv2.cvtColor(cv_image, cv2.COLOR_RGB2GRAY)
ret,cv_image_binary = cv2.threshold(cv_image_gray,128,255,cv2.THRESH_BINARY_INV)
cv_image_28 = cv2.resize(cv_image_binary,(28,28))
np_image = np.reshape(cv_image_28,(1,28,28,1))
predict_num = self._session.run(self.y_conv, feed_dict={self.x:np_image,self.keep_prob:1.0})
answer = np.argmax(predict_num,1)
rospy.loginfo('%d' % answer)
self._pub.publish(answer)
This is read every time an image message comes in. After converting the message to an image with cv_bridge, it is grayscaled, binarized, black and white inverted, and size adjusted, and the image is thrown into CNN. The one with the highest probability of the returned estimation result predict_num is set as answer and published.
The main function is omitted.
Now you can handle anything with ROS as long as you have a trained model of Tensorflow. I think FasterRCNN can do object recognition and face recognition, so I'd like to try it.
Recommended Posts