Build a "bot that tells you AV actresses with similar faces" by deep learning

Overview

Using the Facebook Messenger API I implemented a bot that tells me similar AV actresses when I upload an image.

スクリーンショット 2016-05-05 12.51.16_censored.jpg

System configuration

The server that performs the bot response is created with Go for various reasons, the image discrimination is created with Python (face detection is ʻOpenCV, and the convolutional neural network for classification is TensorFlow). The I / F between languages is gRPC`, which is RPC from Go to Python.

Implementation

Go side

A Worker process that receives a webhook from Facebook Messenger and makes a bot response.

Messenger Bot Server Gin is used for the web server. It's not particularly difficult, but when the traffic increases, it seems that messages from multiple users may be collectively POSTed to the webhook. If you use it in an enterprise, you need to be careful about that. Please forgive the error handling is sweet.

const (
    PORT = ":3000"
    VERIFICATION_TOKEN = "{{YOUR_VERIFICATION_TOKEN}}"
    ENDPOINT_URL = "https://graph.facebook.com/v2.6/me/messages"
)

func main() {
	router := gin.Default()
	router.GET("/messenger", varifyToken)
	router.POST("/messenger", processMessages)
	router.Run(PORT)
}

func varifyToken(c *gin.Context) {
    token := c.Query("hub.verify_token")
    challenge := c.Query("hub.challenge")

    if token == VERIFICATION_TOKEN {
        c.String(http.StatusOK, challenge + "\n")
    } else {
        log.WithFields(log.Fields{
            "received": token,
            "expected": VERIFICATION_TOKEN,
        }).Warn("Invalid token.")
    }
}

func processMessages(c *gin.Context) {
    var json model.Webhook
    if c.BindJSON(&json) == nil {
        for _, e := range json.Entry {
            for _, m := range e.Messaging {
                respondToOneMessage(&m)
            }
        }
        c.JSON(http.StatusOK, gin.H{"status": "you are logged in"})
    }
}

func respondToOneMessage(m *model.Messaging) {
    sender := m.Sender.Id

    switch {
    // Receive Text
    case m.Message.Text != "":

    // Receive Image
    case m.Message.Attachments[0].Type == "image":
        url := m.Message.Attachments[0].Payload.Url
        path := util.SaveImg(url)
        rs, err := classifyImg(path)
	if err != nil {
	    log.Fatal(err)
	}

	txt := fmt.Sprintf("The person in the photo%Similarity with s%f%%is.", rs.Result[0].Label, rs.Result[0].Accuracy * 100)
        err2 := sendTextMessage(sender, txt)
	if err2 != nil {
	    log.Fatal(err2)
	}

    default:
        log.Error("Unexpected Message")
    }
}

func sendTextMessage(recipient int64, text string) error {
    endpoint := fmt.Sprintf("%s?%s=%s", ENDPOINT_URL, "access_token", VERIFICATION_TOKEN)
    json := `{"recipient":{"id":%d},"message":{"text":"%s"}}`
    body := fmt.Sprintf(json, recipient, text)

    req, err := http.NewRequest(
        "POST",
        endpoint,
        strings.NewReader(body),
    )
    if err != nil {
        return err
    }

    req.Header.Set("Content-Type", "application/json")

    client := &http.Client{ Timeout: time.Duration(3 * time.Second) }
    resp, err := client.Do(req)
    log.Printf("requested")
    defer resp.Body.Close()

    return err
}

Python side

Given the path of the image, the face is detected and the trained convolutional neural network determines the similarity of the face.

Face detection with OpenCV

Well, the image I got, no matter how deep learning it is, even if it is classified by CNN as it is, it will not be very accurate, so first trim only the face part. This time, I used ʻOpenCV` for detection. It takes a NumPy format Array as an argument and returns the result of trimming only the face part. There was also a horror image in which the right ear was detected as a face for some reason. I'm a little scared because it seems to detect psychic photographs.

def face_detect(img):
    face_cascade = cv2.CascadeClassifier('./haarcascade_frontalface_default.xml')
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(
        gray,
        scaleFactor=1.1,
        minNeighbors=5,
        minSize=(30, 30),
        flags = cv2.CASCADE_SCALE_IMAGE
    )
    if len(faces) > 0:
        fc = faces[0]
        x = fc[0]
        y = fc[1]
        w = fc[2]
        h = fc[3]
        return img[y:y+h, x:x+w]
    else:
        return None

I thought it would be quite difficult, but that's it. I was surprised because it was too convenient. I will study the algorithm properly next time.

CNN in TensorFlow

Train network weights using collected and preprocessed images.

The structure of the convolutional neural network is Deep MNIST for Experts. the same,

It is 6 layers of.

I don't know how to use TensorFlow with just the tutorial, so read TensorFlow Mechanics 101 carefully. Is recommended.

The modeling part is excerpted.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
import cv2
import numpy as np
import tensorflow as tf

NUM_CLASSES = 5
IMAGE_SIZE = 28

class CNNetwork:

    def inference(self, x_images, keep_prob):

        def weight_variable(shape):
          initial = tf.truncated_normal(shape, stddev=0.1)
          return tf.Variable(initial)

        def bias_variable(shape):
          initial = tf.constant(0.1, shape=shape)
          return tf.Variable(initial)

        def conv2d(x, W):
          return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

        def max_pool_2x2(x):
          return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                                strides=[1, 2, 2, 1], padding='SAME')

        with tf.name_scope('conv1') as scope:
            W_conv1 = weight_variable([5, 5, 3, 32])
            b_conv1 = bias_variable([32])
            h_conv1 = tf.nn.relu(tf.nn.bias_add(conv2d(x_images, W_conv1), b_conv1))

        with tf.name_scope('pool1') as scope:
            h_pool1 = max_pool_2x2(h_conv1)
        
        with tf.name_scope('conv2') as scope:
            W_conv2 = weight_variable([5, 5, 32, 64])
            b_conv2 = bias_variable([64])
            h_conv2 = tf.nn.relu(tf.nn.bias_add(conv2d(h_pool1, W_conv2), b_conv2))

        with tf.name_scope('pool2') as scope:
            h_pool2 = max_pool_2x2(h_conv2)

        with tf.name_scope('fc1') as scope:
            W_fc1 = weight_variable([7*7*64, 1024])
            b_fc1 = bias_variable([1024])
            h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
            h_fc1 = tf.nn.relu(tf.nn.bias_add(tf.matmul(h_pool2_flat, W_fc1), b_fc1))
            h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

        with tf.name_scope('fc2') as scope:
            W_fc2 = weight_variable([1024, NUM_CLASSES])
            b_fc2 = bias_variable([NUM_CLASSES])

        with tf.name_scope('softmax') as scope:
            y_conv=tf.nn.softmax(tf.nn.bias_add(tf.matmul(h_fc1_drop, W_fc2), b_fc2))

        return y_conv

At the time of training, by saving the weight of the training result in a binary file as follows, It can be used when calling the classification function by RPC.

saver = tf.train.Saver()
save_path = saver.save(sess, "model.ckpt")

This is a classification function that returns the execution result of the softmax function at the deepest layer of the network.

def classify(self, image_path):
    try:
        img = cv2.imread(image_path)
        img = face_detect(img)
        img = cv2.resize(img, (IMAGE_SIZE, IMAGE_SIZE))
        img = img.astype(np.float32)/255.0

        images_placeholder = tf.placeholder("float", shape=(None, IMAGE_SIZE, IMAGE_SIZE, 3))
        labels_placeholder = tf.placeholder("float", shape=(None, NUM_CLASSES))
        keep_prob = tf.placeholder("float")

        logits = self.inference(images_placeholder, keep_prob)
        sess = tf.InteractiveSession()

        saver = tf.train.Saver()
        sess.run(tf.initialize_all_variables())
        saver.restore(sess, "./model.ckpt")

        pred = logits.eval(feed_dict={images_placeholder: [img],keep_prob: 1.0 })[0]
        return pred

    except Exception as e:
        print 'message:' + e.message

gRPC Finally, RPC TensorFlow from the bot server implemented in Go language. gRPC uses Protocol Buffers as its data format. Roughly speaking, it is a general-purpose data definition for communication between programs. If you create a definition file .proto file, you can generate a library for serialization / deserialization for each language with a command.

Data structure definition

First, create a proto file that defines the data structure as shown below. cnn.proto

syntax = "proto3";

package cnn;

service Classifier {
    rpc classify (CnnRequest) returns (CnnResponse){}
}

message CnnRequest {
    string filepath = 1;
}

message CnnResponse {
    repeated Result result = 1;
}

message Result {
    string label = 1;
    double accuracy = 2;
}

After completing the definition, create a library file for each language of Go and Python.

# go
protoc --go_out=plugins=grpc:./ cnn.proto

# Python
protoc --python_out=. --grpc_out=. --plugin=protoc-gen-grpc=`which grpc_python_plugin` cnn.proto

That's all it takes to generate libraries for each language, cnn.pb.go and cnn_pb2.py.

gRPC server construction

Implement the gRPC server using the generated library.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import time
import cnn_pb2 as pb
import cnn

_ONE_DAY_IN_SECONDS = 60 * 60 * 24

class Classier(pb.BetaClassifierServicer):

  def classify(self, request, context):
    path = request.filepath
    print path
    n = cnn.CNNetwork() 
    accuracies = n.classify(path)
    print accuracies
    labels = ['Kaho Shibuya', 'AIKA', 'Aki Sasaki', 'Ai Uehara', 'Ayumi Shinoda']
    nameWithAccuracy = []
    for i in range (0, len(labels)):
        nameWithAccuracy.append((accuracies[i], labels[i]))
    nameWithAccuracy.sort(reverse=True)

    response = pb.CnnResponse()
    try:
        #Return the top 3 people for the time being
        for i in range(0, 3):
            r = pb.Result()
            label = nameWithAccuracy[i][1]
            accuracy = float(nameWithAccuracy[i][0])
            response.result.add(label=label, accuracy=accuracy)

    except Exception as e:
        print e.message

    return response


def serve():
  server = pb.beta_create_Classifier_server(Classier())
  server.add_insecure_port('[::]:50051')
  server.start()
  try:
    while True:
      time.sleep(_ONE_DAY_IN_SECONDS)
  except KeyboardInterrupt:
    server.stop(0)

if __name__ == '__main__':
  serve()

gRPC client

Next, we will implement the gRPC client in Go language.

//Excerpt
func classifyImg(filepath string) (*cnn.CnnResponse, error) {
    address := "localhost:50051"

    conn, err := grpc.Dial(address, grpc.WithInsecure())
    if err != nil {
        log.Fatalf("did not connect: %v", err)
    }
    defer conn.Close()
    c := cnn.NewClassifierClient(conn)

    result, err := c.Classify(context.Background(), &cnn.CnnRequest{Filepath: filepath})
    if err != nil {
        log.Fatalf("couldn't classify: %v", err)
        return nil, err
    }
    return result, nil
}

in conclusion

Impressions

Technically, building OpenCV on Amazon Linux took the most effort than programming. The discrimination accuracy of the convolutional neural network using the test data was 79%. If it is a photo taken from the front like the capture at the beginning, the judgment accuracy is relatively high, but I couldn't distinguish the photo with the expression like Shoei who cried.

References

[Linear algebra for programming](http://www.amazon.co.jp/%E3%83%97%E3%83%AD%E3%82%B0%E3%83%A9%E3%83% 9F% E3% 83% B3% E3% 82% B0% E3% 81% AE% E3% 81% 9F% E3% 82% 81% E3% 81% AE% E7% B7% 9A% E5% BD% A2% E4% BB% A3% E6% 95% B0-% E5% B9% B3% E5% B2% A1-% E5% 92% 8C% E5% B9% B8 / dp / 4274065782) I didn't know the basics of linear algebra in the first place, so I studied from scratch.

[Deep Learning (Machine Learning Professional Series)](https://www.amazon.co.jp/%E6%B7%B1%E5%B1%A4%E5%AD%A6%E7%BF%92-%E6 % A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E3% 83% 97% E3% 83% AD% E3% 83% 95% E3% 82% A7% E3% 83 % 83% E3% 82% B7% E3% 83% A7% E3% 83% 8A% E3% 83% AB% E3% 82% B7% E3% 83% AA% E3% 83% BC% E3% 82% BA -% E5% B2% A1% E8% B0% B7% E8% B2% B4% E4% B9% 8B-ebook / dp / B018K6C99A? Ie = UTF8 & btkr = 1 & ref_ = dp-kindle-redirect) Since the expansion of the formula is written in quite detail, I could read it at the last minute.

Identify the anime Yuruyuri production company with TensorFlow For the implementation of the convolutional neural network, I referred to this, which is explained carefully.

Recommended Posts

Build a "bot that tells you AV actresses with similar faces" by deep learning
Disclose the know-how that created a similar image search service for AV actresses by deep learning by chainer
Classify anime faces by sequel / deep learning with Keras
Try to build a deep learning / neural network with scratch
(Now) Build a GPU Deep Learning environment with GeForce GTX 960
I searched for a similar card of Hearthstone with Deep Learning
Classify anime faces with deep learning with Chainer
99.78% accuracy with deep learning by recognizing handwritten hiragana
Build a Python machine learning environment with a container
A story about predicting exchange rates with Deep Learning
Build a machine learning application development environment with Python
A learning roadmap that allows you to develop and publish services from scratch with Python
I tried to make Othello AI that I learned 7.2 million hands by deep learning with Chainer