Using the Facebook Messenger API I implemented a bot that tells me similar AV actresses when I upload an image.
The server that performs the bot response is created with Go
for various reasons, the image discrimination is created with Python
(face detection is ʻOpenCV, and the convolutional neural network for classification is
TensorFlow). The I / F between languages is
gRPC`, which is RPC from Go to Python.
A Worker process that receives a webhook from Facebook Messenger and makes a bot response.
Messenger Bot Server Gin is used for the web server. It's not particularly difficult, but when the traffic increases, it seems that messages from multiple users may be collectively POSTed to the webhook. If you use it in an enterprise, you need to be careful about that. Please forgive the error handling is sweet.
const (
PORT = ":3000"
VERIFICATION_TOKEN = "{{YOUR_VERIFICATION_TOKEN}}"
ENDPOINT_URL = "https://graph.facebook.com/v2.6/me/messages"
)
func main() {
router := gin.Default()
router.GET("/messenger", varifyToken)
router.POST("/messenger", processMessages)
router.Run(PORT)
}
func varifyToken(c *gin.Context) {
token := c.Query("hub.verify_token")
challenge := c.Query("hub.challenge")
if token == VERIFICATION_TOKEN {
c.String(http.StatusOK, challenge + "\n")
} else {
log.WithFields(log.Fields{
"received": token,
"expected": VERIFICATION_TOKEN,
}).Warn("Invalid token.")
}
}
func processMessages(c *gin.Context) {
var json model.Webhook
if c.BindJSON(&json) == nil {
for _, e := range json.Entry {
for _, m := range e.Messaging {
respondToOneMessage(&m)
}
}
c.JSON(http.StatusOK, gin.H{"status": "you are logged in"})
}
}
func respondToOneMessage(m *model.Messaging) {
sender := m.Sender.Id
switch {
// Receive Text
case m.Message.Text != "":
// Receive Image
case m.Message.Attachments[0].Type == "image":
url := m.Message.Attachments[0].Payload.Url
path := util.SaveImg(url)
rs, err := classifyImg(path)
if err != nil {
log.Fatal(err)
}
txt := fmt.Sprintf("The person in the photo%Similarity with s%f%%is.", rs.Result[0].Label, rs.Result[0].Accuracy * 100)
err2 := sendTextMessage(sender, txt)
if err2 != nil {
log.Fatal(err2)
}
default:
log.Error("Unexpected Message")
}
}
func sendTextMessage(recipient int64, text string) error {
endpoint := fmt.Sprintf("%s?%s=%s", ENDPOINT_URL, "access_token", VERIFICATION_TOKEN)
json := `{"recipient":{"id":%d},"message":{"text":"%s"}}`
body := fmt.Sprintf(json, recipient, text)
req, err := http.NewRequest(
"POST",
endpoint,
strings.NewReader(body),
)
if err != nil {
return err
}
req.Header.Set("Content-Type", "application/json")
client := &http.Client{ Timeout: time.Duration(3 * time.Second) }
resp, err := client.Do(req)
log.Printf("requested")
defer resp.Body.Close()
return err
}
Given the path of the image, the face is detected and the trained convolutional neural network determines the similarity of the face.
Well, the image I got, no matter how deep learning it is, even if it is classified by CNN as it is, it will not be very accurate, so first trim only the face part. This time, I used ʻOpenCV` for detection. It takes a NumPy format Array as an argument and returns the result of trimming only the face part. There was also a horror image in which the right ear was detected as a face for some reason. I'm a little scared because it seems to detect psychic photographs.
def face_detect(img):
face_cascade = cv2.CascadeClassifier('./haarcascade_frontalface_default.xml')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(
gray,
scaleFactor=1.1,
minNeighbors=5,
minSize=(30, 30),
flags = cv2.CASCADE_SCALE_IMAGE
)
if len(faces) > 0:
fc = faces[0]
x = fc[0]
y = fc[1]
w = fc[2]
h = fc[3]
return img[y:y+h, x:x+w]
else:
return None
I thought it would be quite difficult, but that's it. I was surprised because it was too convenient. I will study the algorithm properly next time.
Train network weights using collected and preprocessed images.
The structure of the convolutional neural network is Deep MNIST for Experts. the same,
It is 6 layers of.
I don't know how to use TensorFlow with just the tutorial, so read TensorFlow Mechanics 101 carefully. Is recommended.
The modeling part is excerpted.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
import cv2
import numpy as np
import tensorflow as tf
NUM_CLASSES = 5
IMAGE_SIZE = 28
class CNNetwork:
def inference(self, x_images, keep_prob):
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
with tf.name_scope('conv1') as scope:
W_conv1 = weight_variable([5, 5, 3, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(tf.nn.bias_add(conv2d(x_images, W_conv1), b_conv1))
with tf.name_scope('pool1') as scope:
h_pool1 = max_pool_2x2(h_conv1)
with tf.name_scope('conv2') as scope:
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(tf.nn.bias_add(conv2d(h_pool1, W_conv2), b_conv2))
with tf.name_scope('pool2') as scope:
h_pool2 = max_pool_2x2(h_conv2)
with tf.name_scope('fc1') as scope:
W_fc1 = weight_variable([7*7*64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.nn.bias_add(tf.matmul(h_pool2_flat, W_fc1), b_fc1))
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
with tf.name_scope('fc2') as scope:
W_fc2 = weight_variable([1024, NUM_CLASSES])
b_fc2 = bias_variable([NUM_CLASSES])
with tf.name_scope('softmax') as scope:
y_conv=tf.nn.softmax(tf.nn.bias_add(tf.matmul(h_fc1_drop, W_fc2), b_fc2))
return y_conv
At the time of training, by saving the weight of the training result in a binary file as follows, It can be used when calling the classification function by RPC.
saver = tf.train.Saver()
save_path = saver.save(sess, "model.ckpt")
This is a classification function that returns the execution result of the softmax function at the deepest layer of the network.
def classify(self, image_path):
try:
img = cv2.imread(image_path)
img = face_detect(img)
img = cv2.resize(img, (IMAGE_SIZE, IMAGE_SIZE))
img = img.astype(np.float32)/255.0
images_placeholder = tf.placeholder("float", shape=(None, IMAGE_SIZE, IMAGE_SIZE, 3))
labels_placeholder = tf.placeholder("float", shape=(None, NUM_CLASSES))
keep_prob = tf.placeholder("float")
logits = self.inference(images_placeholder, keep_prob)
sess = tf.InteractiveSession()
saver = tf.train.Saver()
sess.run(tf.initialize_all_variables())
saver.restore(sess, "./model.ckpt")
pred = logits.eval(feed_dict={images_placeholder: [img],keep_prob: 1.0 })[0]
return pred
except Exception as e:
print 'message:' + e.message
gRPC
Finally, RPC TensorFlow from the bot server implemented in Go language.
gRPC uses Protocol Buffers as its data format.
Roughly speaking, it is a general-purpose data definition for communication between programs.
If you create a definition file .proto
file, you can generate a library for serialization / deserialization for each language with a command.
First, create a proto
file that defines the data structure as shown below.
cnn.proto
syntax = "proto3";
package cnn;
service Classifier {
rpc classify (CnnRequest) returns (CnnResponse){}
}
message CnnRequest {
string filepath = 1;
}
message CnnResponse {
repeated Result result = 1;
}
message Result {
string label = 1;
double accuracy = 2;
}
After completing the definition, create a library file for each language of Go and Python.
# go
protoc --go_out=plugins=grpc:./ cnn.proto
# Python
protoc --python_out=. --grpc_out=. --plugin=protoc-gen-grpc=`which grpc_python_plugin` cnn.proto
That's all it takes to generate libraries for each language, cnn.pb.go
and cnn_pb2.py
.
Implement the gRPC server using the generated library.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import time
import cnn_pb2 as pb
import cnn
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
class Classier(pb.BetaClassifierServicer):
def classify(self, request, context):
path = request.filepath
print path
n = cnn.CNNetwork()
accuracies = n.classify(path)
print accuracies
labels = ['Kaho Shibuya', 'AIKA', 'Aki Sasaki', 'Ai Uehara', 'Ayumi Shinoda']
nameWithAccuracy = []
for i in range (0, len(labels)):
nameWithAccuracy.append((accuracies[i], labels[i]))
nameWithAccuracy.sort(reverse=True)
response = pb.CnnResponse()
try:
#Return the top 3 people for the time being
for i in range(0, 3):
r = pb.Result()
label = nameWithAccuracy[i][1]
accuracy = float(nameWithAccuracy[i][0])
response.result.add(label=label, accuracy=accuracy)
except Exception as e:
print e.message
return response
def serve():
server = pb.beta_create_Classifier_server(Classier())
server.add_insecure_port('[::]:50051')
server.start()
try:
while True:
time.sleep(_ONE_DAY_IN_SECONDS)
except KeyboardInterrupt:
server.stop(0)
if __name__ == '__main__':
serve()
Next, we will implement the gRPC client in Go language.
//Excerpt
func classifyImg(filepath string) (*cnn.CnnResponse, error) {
address := "localhost:50051"
conn, err := grpc.Dial(address, grpc.WithInsecure())
if err != nil {
log.Fatalf("did not connect: %v", err)
}
defer conn.Close()
c := cnn.NewClassifierClient(conn)
result, err := c.Classify(context.Background(), &cnn.CnnRequest{Filepath: filepath})
if err != nil {
log.Fatalf("couldn't classify: %v", err)
return nil, err
}
return result, nil
}
Technically, building OpenCV on Amazon Linux took the most effort than programming. The discrimination accuracy of the convolutional neural network using the test data was 79%. If it is a photo taken from the front like the capture at the beginning, the judgment accuracy is relatively high, but I couldn't distinguish the photo with the expression like Shoei who cried.
[Linear algebra for programming](http://www.amazon.co.jp/%E3%83%97%E3%83%AD%E3%82%B0%E3%83%A9%E3%83% 9F% E3% 83% B3% E3% 82% B0% E3% 81% AE% E3% 81% 9F% E3% 82% 81% E3% 81% AE% E7% B7% 9A% E5% BD% A2% E4% BB% A3% E6% 95% B0-% E5% B9% B3% E5% B2% A1-% E5% 92% 8C% E5% B9% B8 / dp / 4274065782) I didn't know the basics of linear algebra in the first place, so I studied from scratch.
[Deep Learning (Machine Learning Professional Series)](https://www.amazon.co.jp/%E6%B7%B1%E5%B1%A4%E5%AD%A6%E7%BF%92-%E6 % A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E3% 83% 97% E3% 83% AD% E3% 83% 95% E3% 82% A7% E3% 83 % 83% E3% 82% B7% E3% 83% A7% E3% 83% 8A% E3% 83% AB% E3% 82% B7% E3% 83% AA% E3% 83% BC% E3% 82% BA -% E5% B2% A1% E8% B0% B7% E8% B2% B4% E4% B9% 8B-ebook / dp / B018K6C99A? Ie = UTF8 & btkr = 1 & ref_ = dp-kindle-redirect) Since the expansion of the formula is written in quite detail, I could read it at the last minute.
Identify the anime Yuruyuri production company with TensorFlow For the implementation of the convolutional neural network, I referred to this, which is explained carefully.
Recommended Posts