Introduction

s_スクリーンショット 2016-07-27 12.37.01.png

Hi, this is Hironsan.

Face recognition is a technology that detects a person in an image and identifies the person. Face recognition can be used by incorporating it into a surveillance camera system to improve security, or by incorporating it into a robot to recognize the face of a family member.

This time, we will build a convolutional neural network using TensorFlow and make a face recognizer using an existing data set.

Target audience

Know the Convolutional Neural Network (CNN)
I don't know how to write in TensorFlow

You can see the theory of CNN by looking at the following.

What is a Convolutional Neural Network

Preparation

Install TensorFlow

Please refer to the official website for the installation of TensorFlow, which is explained carefully.

Install TensorFlow

Download data

First, prepare the data set. This time, we will use the following face image dataset.

Olivetti Faces

This dataset contains 10 images for each of 40 people. Each image size is 64x64 and is a grayscale image. スクリーンショット 2016-07-25 22.30.04.png

Loading images

After preparing the dataset, load the image. PyFaceRecognizer/example/input_data.py

import input_data
dataset = input_data.read_data_sets('data/olivettifaces.mat')

Here, the dataset contains training data, validation data, and test data. In addition, the image size is reduced to 32x32 at the stage of reading.

Building a convolutional neural network

Face recognition is performed using a convolutional neural network (CNN). The overall picture is as follows.

The conv, pool, and fc of each layer represent the convolution layer, pooling layer, and fully connected layer, respectively. ReL in the function column represents a rectified linear function. The table of parameters is as follows.

Layer type / name	patch	stride	Output map size	function
data	-	-	32 x 32 x 1	-
conv1	5 x 5	1	32 x 32 x 32	ReL
pool1	2 x 2	2	16 x 16 x 32	-
conv2	5 x 5	1	16 x 16 x 64	ReL
pool2	2 x 2	2	8 x 8 x 64	-
fc3	-	-	1 x 1 x 1024	ReL
fc4	-	-	1 x 1 x 40	softmax

If you write it in code, it will be as follows. It's almost the same. PyFaceRecognizer/example/run.py

def inference(input_placeholder, keep_prob):
    W_conv1 = weight_variable([5, 5, 1, 32])  #The first two are patch sizes. The rest is the number of input and output channels
    b_conv1 = bias_variable([32])

    x_image = tf.reshape(input_placeholder, [-1, 32, 32, 1])  #The second and third dimensions are the width and height of the image, and the last dimension is the number of color channels.

    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)  #Folding
    h_pool1 = max_pool_2x2(h_conv1)  #max pooling

    W_conv2 = weight_variable([5, 5, 32, 64])
    b_conv2 = bias_variable([64])

    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
    h_pool2 = max_pool_2x2(h_conv2)

    W_fc1 = weight_variable([8 * 8 * 64, 1024])
    b_fc1 = bias_variable([1024])

    h_pool2_flat = tf.reshape(h_pool2, [-1, 8 * 8 * 64])
    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

    W_fc2 = weight_variable([1024, 40])
    b_fc2 = bias_variable([40])

    y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)  #Output so

    return y_conv

Model training and evaluation

I wrote the model code in inference. Next, we will write the code to train the model. That is loss and training. In loss, the cross entropy is calculated, and in training, the Adam optimizer is used to update the parameters. The code is as follows.

def loss(output, supervisor_labels_placeholder):
    cross_entropy = tf.reduce_mean(-tf.reduce_sum(supervisor_labels_placeholder * tf.log(output), reduction_indices=[1]))
    return cross_entropy


def training(loss):
    train_step = tf.train.AdamOptimizer(1e-4).minimize(loss)
    return train_step

Face recognition is performed using the inference, loss, and training defined above. Outputs a log every 100 iterations of the training process. I set keep_peob to 1.0 so that it doesn't drop out when testing.

with tf.Session() as sess:
    output = inference(x, keep_prob)
    loss = loss(output, y_)
    training_op = training(loss)

    init = tf.initialize_all_variables()
    sess.run(init)

    for step in range(1000):
        batch = dataset.train.next_batch(40)
        sess.run(training_op, feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
        if step % 100 == 0:
            print(sess.run(loss, feed_dict={x: batch[0], y_: batch[1], keep_prob: 1.0}))

    correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    print('test accuracy %g' % accuracy.eval(feed_dict={x: dataset.test.images, y_: dataset.test.labels, keep_prob: 1.0}))

Run

The execution result looks like the one below. You can see how the cross entropy is decreasing.

Source code

You can download the source code from the following repositories and run it.

PyFaceRecognizer

in conclusion

I tried face recognition on an existing face dataset using a convolutional neural network. Next, I would like to try face detection and face recognition using images acquired in real time from the camera.

Reference material

TensorFlow Tutorial
[Deep Learning (Machine Learning Professional Series)](https://www.amazon.co.jp/%E6%B7%B1%E5%B1%A4%E5%AD%A6%E7%BF%92-% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E3% 83% 97% E3% 83% AD% E3% 83% 95% E3% 82% A7% E3% 83% 83% E3% 82% B7% E3% 83% A7% E3% 83% 8A% E3% 83% AB% E3% 82% B7% E3% 83% AA% E3% 83% BC% E3% 82% BA-% E5% B2% A1% E8% B0% B7-% E8% B2% B4% E4% B9% 8B / dp / 4061529021 / ref = sr_1_1? Ie = UTF8 & qid = 1469600155 & sr = 8-1 & keywords =% E6% B7 % B1% E5% B1% A4% E5% AD% A6% E7% BF% 92)

Make a face recognizer using TensorFlow