Hi, this is Hironsan.
Face recognition is a technology that detects a person in an image and identifies the person. Face recognition can be used by incorporating it into a surveillance camera system to improve security, or by incorporating it into a robot to recognize the face of a family member.
This time, we will build a convolutional neural network using TensorFlow and make a face recognizer using an existing data set.
You can see the theory of CNN by looking at the following.
Please refer to the official website for the installation of TensorFlow, which is explained carefully.
First, prepare the data set. This time, we will use the following face image dataset.
This dataset contains 10 images for each of 40 people. Each image size is 64x64 and is a grayscale image.
After preparing the dataset, load the image. PyFaceRecognizer/example/input_data.py
import input_data
dataset = input_data.read_data_sets('data/olivettifaces.mat')
Here, the dataset contains training data, validation data, and test data. In addition, the image size is reduced to 32x32 at the stage of reading.
Face recognition is performed using a convolutional neural network (CNN). The overall picture is as follows.
The conv, pool, and fc of each layer represent the convolution layer, pooling layer, and fully connected layer, respectively. ReL in the function column represents a rectified linear function. The table of parameters is as follows.
Layer type / name | patch | stride | Output map size | function |
---|---|---|---|---|
data | - | - | 32 x 32 x 1 | - |
conv1 | 5 x 5 | 1 | 32 x 32 x 32 | ReL |
pool1 | 2 x 2 | 2 | 16 x 16 x 32 | - |
conv2 | 5 x 5 | 1 | 16 x 16 x 64 | ReL |
pool2 | 2 x 2 | 2 | 8 x 8 x 64 | - |
fc3 | - | - | 1 x 1 x 1024 | ReL |
fc4 | - | - | 1 x 1 x 40 | softmax |
If you write it in code, it will be as follows. It's almost the same. PyFaceRecognizer/example/run.py
def inference(input_placeholder, keep_prob):
W_conv1 = weight_variable([5, 5, 1, 32]) #The first two are patch sizes. The rest is the number of input and output channels
b_conv1 = bias_variable([32])
x_image = tf.reshape(input_placeholder, [-1, 32, 32, 1]) #The second and third dimensions are the width and height of the image, and the last dimension is the number of color channels.
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) #Folding
h_pool1 = max_pool_2x2(h_conv1) #max pooling
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
W_fc1 = weight_variable([8 * 8 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 8 * 8 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
W_fc2 = weight_variable([1024, 40])
b_fc2 = bias_variable([40])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2) #Output so
return y_conv
I wrote the model code in inference. Next, we will write the code to train the model. That is loss and training. In loss, the cross entropy is calculated, and in training, the Adam optimizer is used to update the parameters. The code is as follows.
def loss(output, supervisor_labels_placeholder):
cross_entropy = tf.reduce_mean(-tf.reduce_sum(supervisor_labels_placeholder * tf.log(output), reduction_indices=[1]))
return cross_entropy
def training(loss):
train_step = tf.train.AdamOptimizer(1e-4).minimize(loss)
return train_step
Face recognition is performed using the inference, loss, and training defined above. Outputs a log every 100 iterations of the training process. I set keep_peob to 1.0 so that it doesn't drop out when testing.
with tf.Session() as sess:
output = inference(x, keep_prob)
loss = loss(output, y_)
training_op = training(loss)
init = tf.initialize_all_variables()
sess.run(init)
for step in range(1000):
batch = dataset.train.next_batch(40)
sess.run(training_op, feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
if step % 100 == 0:
print(sess.run(loss, feed_dict={x: batch[0], y_: batch[1], keep_prob: 1.0}))
correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('test accuracy %g' % accuracy.eval(feed_dict={x: dataset.test.images, y_: dataset.test.labels, keep_prob: 1.0}))
The execution result looks like the one below. You can see how the cross entropy is decreasing.
9.8893
1.68918
0.602403
0.261183
0.0490791
0.0525591
0.0133087
0.0121071
0.00673524
0.00580989
You can download the source code from the following repositories and run it.
I tried face recognition on an existing face dataset using a convolutional neural network. Next, I would like to try face detection and face recognition using images acquired in real time from the camera.
Recommended Posts