INTRODUCTION
The image above is from Miss Campus Ritsumeikan in 2014. Everyone is very beautiful.
On the other hand, at first glance, everyone seems to have a similar face. Do kind call friends? We will call this __Ritsumeikan-like face __.
Also, I often hear words like "Seigaku-ish" and "Let's go to Gakushuin", but this is also because there are __ Seigaku-like faces __ and __ Gakushuin-like faces __. I think so.
Therefore, this time, I tried to deep learn the facial tendency of each university and created a model that can determine which university a beautiful woman is likely to be in.
APPROACH
First, get the images of women from each university. The Miss Contest Portal Site has a systematic collection of photos of past Miss Cons from each university, so I used them.
photo_collector.py
# -*- coding:utf-8 -*-
import os
import bs4
import time
import random
import urllib2
from itertools import chain
from urllib import urlretrieve
base_url = 'http://misscolle.com'
def fetch_page_urls():
html = urllib2.urlopen('{}/versions'.format(base_url))
soup = bs4.BeautifulSoup(html, 'html.parser')
columns = soup.find_all('ul', class_='columns')
atags = map(lambda column: column.find_all('a'), columns)
with open('page_urls.txt', 'w') as f:
for _ in chain.from_iterable(atags):
path = _.get('href')
if not path.startswith('http'): # Relative path
path = '{}{}'.format(base_url, path)
if path[-1] == '/': # Normalize
path = path[:-1]
f.write('{}\n'.format(path))
def fetch_photos():
with open('page_urls.txt') as f:
for url in f:
# Make directories for saving images
dirpath = 'photos/{}'.format(url.strip().split('/')[-1])
os.makedirs(dirpath)
html = urllib2.urlopen('{}/photo'.format(url.strip()))
soup = bs4.BeautifulSoup(html, 'html.parser')
photos = soup.find_all('li', class_='photo')
paths = map(lambda path: path.find('a').get('href'), photos)
for path in paths:
filename = '_'.join(path.split('?')[0].split('/')[-2:])
filepath = '{}/{}'.format(dirpath, filename)
# Download image file
urlretrieve('{}{}'.format(base_url, path), filepath)
# Add random waiting time (4 - 6 sec)
time.sleep(4 + random.randint(0, 2))
if __name__ == '__main__':
fetch_page_urls()
fetch_photos()
When executed, it will be saved in the following directory structure.
http://misscolle.com/img/contests/aoyama2015/1/1.jpg is mapped to photos / aoyama / 2015 / 1_1.jpg
.
photos/
├── aoyama
│ ├── 2008
│ │ ├── 1_1.jpg
│ │ ├── 1_2.jpg
│ │ ├── ...
│ │ ├── 2_1.jpg
│ │ ├── 2_2.jpg
│ │ ├── ...
│ │ └── 6_9.jpg
│ ├── 2009
│ │ ├── 1_1.jpg
│ │ ├── 1_2.jpg
│ │ ├── ...
As a result, a total of 10,725 images (about 2.5G) from 82 universities were collected. It makes me happy to see it.
Next, the face area is detected by OpenCV from these images and trimmed. The evaluator used was frontalface_alt2
.
face_detecter.py
# -*- coding:utf-8 -*-
import os
import cv2
def main():
for srcpath, _, files in os.walk('photos'):
if len(_):
continue
dstpath = srcpath.replace('photos', 'faces')
os.makedirs(dstpath)
for filename in files:
if filename.startswith('.'): # Pass .DS_Store
continue
try:
detect_faces(srcpath, dstpath, filename)
except:
continue
def detect_faces(srcpath, dstpath, filename):
cascade = cv2.CascadeClassifier('haarcascade_frontalface_alt2.xml')
image = cv2.imread('{}/{}'.format(srcpath, filename))
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
faces = cascade.detectMultiScale(gray_image)
# Extract when just one face is detected
if (len(faces) == 1):
(x, y, w, h) = faces[0]
image = image[y:y+h, x:x+w]
image = cv2.resize(image, (100, 100))
cv2.imwrite('{}/{}'.format(dstpath, filename), image)
if __name__ == '__main__':
main()
When this is executed, the face image will be saved in faces /
with the same directory structure as before. I'm happy to see this too (ry
Finally, let's learn with CNN. First, we screened universities based on the following criteria.
--There is data on beauty pageants for the past 5 years or more. --There are more than 100 images in total.
Then, from the 20 universities squeezed by this screening, we decided to select the following 10 universities at our own discretion and classify them into 10 classes.
--Aoyama Gakuin University --Jissen Women's University
The test data for each university and the latest year is used, and the other data is used as training data. The training data and test data were 1,700 and 154, respectively.
Build a CNN with Tensorflow and learn.
cnn.py
# -*- coding:utf-8 -*-
import os
import random
import numpy as np
import tensorflow as tf
label_dict = {
'aoyama': 0, 'jissen': 1, 'keio': 2, 'phoenix': 3, 'rika': 4,
'rikkyo': 5, 'seikei': 6, 'sophia': 7, 'todai': 8, 'tonjo': 9
}
def load_data(data_type):
filenames, images, labels = [], [], []
walk = filter(lambda _: not len(_[1]) and data_type in _[0], os.walk('faces'))
for root, dirs, files in walk:
filenames += ['{}/{}'.format(root, _) for _ in files if not _.startswith('.')]
# Shuffle files
random.shuffle(filenames)
# Read, resize, and reshape images
images = map(lambda _: tf.image.decode_jpeg(tf.read_file(_), channels=3), filenames)
images = map(lambda _: tf.image.resize_images(_, [32, 32]), images)
images = map(lambda _: tf.reshape(_, [-1]), images)
for filename in filenames:
label = np.zeros(10)
for k, v in label_dict.iteritems():
if k in filename:
label[v] = 1.
labels.append(label)
return images, labels
def get_batch_list(l, batch_size):
# [1, 2, 3, 4, 5,...] -> [[1, 2, 3], [4, 5,..]]
return [np.asarray(l[_:_+batch_size]) for _ in range(0, len(l), batch_size)]
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
def inference(images_placeholder, keep_prob):
# Convolution layer
x_image = tf.reshape(images_placeholder, [-1, 32, 32, 3])
W_conv1 = weight_variable([5, 5, 3, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
# Pooling layer
h_pool1 = max_pool_2x2(h_conv1)
# Convolution layer
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
# Pooling layer
h_pool2 = max_pool_2x2(h_conv2)
# Full connected layer
W_fc1 = weight_variable([8 * 8 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 8 * 8 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
# Dropout
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
# Full connected layer
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
return tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
def main():
with tf.Graph().as_default():
train_images, train_labels = load_data('train')
test_images, test_labels = load_data('test')
x = tf.placeholder('float', shape=[None, 32 * 32 * 3]) # 32 * 32, 3 channels
y_ = tf.placeholder('float', shape=[None, 10]) # 10 classes
keep_prob = tf.placeholder('float')
y_conv = inference(x, keep_prob)
# Loss function
cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv))
tf.summary.scalar('cross_entropy', cross_entropy)
# Minimize cross entropy by using SGD
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
# Accuracy
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'))
tf.summary.scalar('accuracy', accuracy)
saver = tf.train.Saver()
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
summary_op = tf.summary.merge_all()
summary_writer = tf.summary.FileWriter('./logs', sess.graph)
batched_train_images = get_batch_list(train_images, 25)
batched_train_labels = get_batch_list(train_labels, 25)
train_images = map(lambda _: sess.run(_).astype(np.float32) / 255.0, np.asarray(train_images))
test_images = map(lambda _: sess.run(_).astype(np.float32) / 255.0, np.asarray(test_images))
train_labels, test_labels = np.asarray(train_labels), np.asarray(test_labels)
# Train
for step, (images, labels) in enumerate(zip(batched_train_images, batched_train_labels)):
images = map(lambda _: sess.run(_).astype(np.float32) / 255.0, images)
sess.run(train_step, feed_dict={ x: images, y_: labels, keep_prob: 0.5 })
train_accuracy = accuracy.eval(feed_dict = {
x: train_images, y_: train_labels, keep_prob: 1.0 })
print 'step {}, training accuracy {}'.format(step, train_accuracy)
summary_str = sess.run(summary_op, feed_dict={
x: train_images, y_: train_labels, keep_prob: 1.0 })
summary_writer.add_summary(summary_str, step)
# Test trained model
test_accuracy = accuracy.eval(feed_dict = {
x: test_images, y_: test_labels, keep_prob: 1.0 })
print 'test accuracy {}'.format(test_accuracy)
# Save model
save_path = saver.save(sess, "model.ckpt")
if __name__ == '__main__':
main()
Due to the lack of training data, the precision rate was about 20%, and learning did not progress much. However, the accuracy of the test data is 19.48% (the base is 10% because it is classified into 10 classes), which means that 1 in 5 people is assigned, so from this learning result, it is said that a reasonable tendency was suppressed. Is that something ...
EXPERIMENTAL RESULTS
Using this learning model, I will try to determine which university Gacky looks like from the image below. Replaced main in cnn.py
with the following.
cnn.py
def main():
with tf.Graph().as_default():
test_images, test_labels = load_data('experiment')
x = tf.placeholder('float', shape=[None, 32 * 32 * 3]) # 32 * 32, 3 channels
y_ = tf.placeholder('float', shape=[None, 10]) # 10 classes
keep_prob = tf.placeholder('float')
y_conv = inference(x, keep_prob)
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
saver.restore(sess, "./model.ckpt")
test_images = map(lambda _: sess.run(_).astype(np.float32) / 255.0, np.asarray(test_images))
print y_conv.eval(feed_dict={ x: [train_images[0]], keep_prob: 1.0 })[0]
print np.argmax(y_conv.eval(feed_dict={ x: [train_images[0]], keep_prob: 1.0 })[0])
The following is the execution result. The result was that Gacky had a blue-eyed face (34.83%). I feel like I know what it is. After Seigaku, Nihon University and Rikkyo continued.
#Corresponds to the following
#
# label_dict = {
# 'aoyama': 0, 'jissen': 1, 'keio': 2, 'phoenix': 3, 'rika': 4,
# 'rikkyo': 5, 'seikei': 6, 'sophia': 7, 'todai': 8, 'tonjo': 9
# }
[ 0.34834844 0.0005422 0.00995418 0.21047674 0.13970862 0.15559362 0.03095848 0.09672297 0.00721581 0.00047894]
# argmax
0
RELATED WORK
-Deep learning with TensorFlow to identify the face of an idol --Sugyan memo
The other day announced in Singapore, I thought it was amazing.
-Deep learning to determine if you have big breasts from your face photo (it works or not) --Qiita
Can you identify with high accuracy the app ChiChi that you developed before (which did not pass Apple's examination) and you can see the number of cups just by holding your smartphone in your chest? I want to compete.
FUTURE WORK
At each university, the results suggest that there is a slight tendency in the face, but the accuracy is still low, so I would like to improve the accuracy by increasing the data and tuning the model. .. I built a fairly simple model this time, but if you look at the code in cifar10 in the Tensorflow tutorial, I think there are scattered elements that can be tuned.
In addition, as a horizontal development, let's learn the images of beautiful women in each prefecture of Bijin Clock, and show which prefecture-like face a beautiful woman has. I think it's also interesting.
Recommended Posts