As an application example of a task using deep learning, I tried to "create a beauty pageant support application".
In a convenient world now, there is a University Pageant Portal Site.
There are many participants and it is fun. However, at the same time, I also wanted to see everyone together. I also wanted to put together similar faces. (Maybe it's a bad idea, but at least I enjoy beauty pageants that way)
So, this time we will set the following goals
Therefore, I considered the following work flow.
I'll do all this in Python.
Each can be achieved using the Python library.
I will omit the part 1 because I usually just write the scraping code.
Most of the articles use machine learning libraries.
Switch to the Anaconda distribution, which is strong against machine learning libraries. Reference article
We will also proceed in an environment using MacOSX 10.10 / Python 2.7.
All of the photographs that can be obtained show not only the face but also the whole body and the scenery in the background. For example, It looks like this
This time, the purpose is to bring a similar face to each other, so the whole body and background are unnecessary information.
Therefore, I will automatically cut out only the face part.
This operation can be realized by the combination of OpenCV + cv2.
You can install opencv with the anaconda package management tool conda
.
$ conda install -c https://conda.binstar.org/menpo opencv
Let's check the installation
>>> import cv2
>>> cv2.__version__
'2.4.9.1'
>>> exit()
First, load the image
import cv2
image = cv2.imread(path_to_imagefile)
Next, make settings to determine the area where the face is located. A machine learning model is required to determine the facial area. openCV already has a face area determination model trained with features called __cascade features __.
The face area judgment model is saved as an xml file, so specify the path.
If you can't find it, try searching for haarcascade_frontalface_alt.xml
with the find
command.
Specify the path as a constant.
CASCADE_PATH = "/Users/kensuke-mi/.pyenv/versions/anaconda-2.1.0/share/OpenCV/haarcascades/haarcascade_frontalface_alt.xml"
Before identifying the face area, grayscale it. (Grayscale is not required to identify the face area. This is a pre-process for learning with Deep NN.)
image_gray = cv2.cvtColor(image, cv2.cv.CV_BGR2GRAY)
Furthermore, make the numerical values of the image matrix uniform. For details, see Manual.
image_gray = cv2.equalizeHist(image_gray)
Finally, call the face area judge to find the area.
cascade = cv2.CascadeClassifier(CASCADE_PATH)
facerect = cascade.detectMultiScale(image_gray, scaleFactor=1.1, minNeighbors=3, minSize=(50, 50))
The area coordinates of the face image are returned in facerect
.
I tried to make a series of flows into a function.
def detectFace(image):
image_gray = cv2.cvtColor(image, cv2.cv.CV_BGR2GRAY)
image_gray = cv2.equalizeHist(image_gray)
cascade = cv2.CascadeClassifier(CASCADE_PATH)
facerect = cascade.detectMultiScale(image_gray, scaleFactor=1.1, minNeighbors=3, minSize=(50, 50))
return facerect
Since the area is already known, the face part can be extracted simply by cutting out the area from the ʻimaga` object. This is also a function.
def extract_face(facerect_list, image, path_to_save):
"""Cut out the face part. However, note that there is only one face in one photo.
:param facerect_list:
:param image:
:param path_to_save:
:return:
"""
assert os.path.exists(os.path.dirname(path_to_save))
for rect in facerect_list:
x = rect[0]
y = rect[1]
w = rect[2]
h = rect[3]
# img[y: y + h, x: x + w]
cv2.imwrite(path_to_save, image[y:y+h, x:x+w])
return image[y:y+h, x:x+w]
Uniforms the size of the image.
This is to learn efficiently with DeepNN. Even if the sizes are different, it is possible to learn, but the amount of calculation will be large, so we will arrange the sizes here.
Set the ʻimage object as ʻim
, specify the size after resizing, and execute resizing.
RESIZED_TUPLE = (100, 100)
resized_im = cv2.resize(im,size_tuple)
Please refer to github for the flow.
RBM, which is a type of Deep NN (Neural Network), is used. This article is excellent for the operating principle of RBM, so it is good to refer to it.
To use RBM, use the Pylearn2
library.
When it comes to deep learning, PFI's chainer
is famous, but chainer
does not support learning methods for using RBM (as of October 25, 2015).
If you want to use Auto Encoder network instead of RBM, we recommend chainer
.
Basically, all you have to do is clone the Git repository and install it.
git clone git://github.com/lisa-lab/pylearn2.git
cd pylearn2
python setup.py build
sudo python setup.py install
After this, I often see explanations such as "pass the path used by pylearn2", but in fact Pylearn2 works even if you do not pass the path.
However, in order for the tutorial code to work, you still have to go through the path.
From git clone to execution of tutorial code made into a shell script, please refer to it. (I'm sorry if it doesn't work)
The general flow of pylearn2 is like this
The training data is prepared by numpy's ndarray and converted into an object for pyreann2.
Here, it is assumed that the data is already prepared in numpy.ndarray
.
First, prepare a class to rewrite the format for pylearn2.
Here, we will prepare a class called FacePicDataSet
, which means" face image data ". This class inherits from pylearn2.datasets.DenseDesignMatrix
.
from pylearn2.datasets import DenseDesignMatrix
class FacePicDataSet(DenseDesignMatrix):
def __init__(self, data):
self.data = data
super(FacePicDataSet, self).__init__(X=data))
Next, create a FacePicDataSet
object
train = FacePicDataSet(data=data_matrix)
Next, save the nupmy format file and the dataset pickle file.
from pylearn2.utils import serial
train.use_design_loc(path_to_npy_file)
# save in pickle
serial.save(train_pkl_path, train)
Basically, it's like bringing a tutorial template and playing with it.
The place to call from the training script and insert the variable is described with % (variable name) type
.
The point is
raw: & raw_train! Pkl:
nvis:
(10000 dimensions for 100 * 100 images)save_path:
Please refer to My yaml file for the full text as it will be longer.
First, import the usage package. Be sure to import the class of the dataset pickle file __
import os
from pylearn2.testing import skip
from pylearn2.testing import no_debug_mode
from pylearn2.config import yaml_parse
from make_dataset_pylearn2 import FacePicDataSet
First, write a function for training
@no_debug_mode
def train_yaml(yaml_file):
train = yaml_parse.load(yaml_file)
train.main_loop()
In addition, read the yaml file and fill in the variables.
yaml_file_path = path_to_yaml_file
save_path = path_to_save_file
yaml = open(yaml_file_path, 'r').read()
hyper_params = {'detector_layer_dim': 500,
'monitoring_batches': 10,
'train_stop': 50000,
'max_epochs': 300,
'save_path': save_path,
'input_pickle_path': input_pickle_path}
yaml = yaml % (hyper_params)
Finally, run training
train_yaml(yaml)
DeepNN has a hidden layer, and this hidden layer corresponds to the extracted features.
Since it's a big deal, I also created a script to extract the hidden layer.
All you have to do is create an image using the numbers in nupmy.ndarray
.
Please see here for a detailed explanation.
When executed, images for the number of hidden layer nodes will be displayed in this way.
The original data is mapped using the learned features.
For example, this time, the original data has 711 photos, and each image is 150 * 150 = 22,500 dimensions.
Therefore, the matrix of the original data is composed of 711 * 22500
.
On the other hand, the feature transformation matrix is (the number of original dimensions) * (the number of nodes in the hidden layer)
.
This time, we have 500 hidden layer nodes, so it is a matrix of 22500 * 500
.
Therefore, after conversion, the matrix will be (711 * 22500) * (22500 * 500) = 711 * 500
.
Read the matrix data of 711 * 22500
.
It seems that I should read the training data pickle file, but it didn't work. I'm getting some error due to the FacePicDataSet
class.
So, this time, I will read the numpy file.
import numpy
datasource_ndarray = numpy.load(path_to_datasource_npy)
So, the original data has been read.
Then load the trained pickle object file.
import pickle
file_obj = open(path_to_trained_model_pickle, 'rb')
model_object = pickle.load(file_obj)
In addition, get the weight matrix of the hidden layer
feature_vector = model_object.get_weights()
Finally, spatial map
Since feature_vector
has been translocated, it will be translocated with T
med.
new_space_matrix = numpy.dot(a=data_matrix, b=feature_vectors.T)
This will give you the transformed matrix new_space_matrix
.
This operation is commonly referred to as ʻembedding`.
Compress the embedding data.
Because I want to make a two-dimensional scatter plot this time, it is still large even in a 500-dimensional space.
TSNE / PCA is often used for dimensional compression. (I think empirically)
These two can be easily done using scikit-learn. tSNE, pca .decomposition.PCA.html)
I will omit the detailed explanation, so please see the scikit-learn example and my code. Please give me.
Create an interactive scatter plot.
I want some features because it's a big deal. Because it supports beauty pageants
I want the function of.
You can do it all. With Bokeh!
If you're using it with anaconda, it's very easy.
The command on the Bokeh Official Page ends immediately.
However, be aware that it will replace the standard ipython notebook.
It's a good idea to be able to switch the global python environment with pyenv.
The nice thing about Bokeh is that you can create it with ipython notebook.
I created it using ipython notebook.
Somehow, the explanation is mixed up, but suppose you have a dict object like this now. Keys and values are stored for the number of contestants.
{
string: {
"major": string,
"grade": int,
"age": int,
"member_name_rubi": string,
"height": float,
"member_index": int,
"profile_url": string,
"blog_url": string,
"member_name": string,
"university": string,
"position_vector": [
float,
float
],
"photo_url": string
}
To create a graph in Bokeh, roughly operate like this
show (graph object)
To create a Bokeh table object, prepare multiple list data and use the ColumnDataSource
method.
Transform the dict object from earlier to make a data structure like this
{
'X': [Data around here],
'Y': [Data around here],
'desc': [Data around here],
'imgs': [Data around here],
Combined with key and value
}
X and Y can be any key name such as desc. Obviously, all list of values must be the same length. Save it with the object name items_for_graph.
First, create a table object for Bokeh.
from bokeh.plotting import figure, output_file, show, ColumnDataSource
source = ColumnDataSource(
data=dict(
x=items_for_graph['X'],
y=items_for_graph['Y'],
desc=items_for_graph['labels'],
imgs = items_for_graph['images'],
univ = items_for_graph['universities'],
major = items_for_graph['major'],
height = items_for_graph['height'],
age = items_for_graph['age'],
blog = items_for_graph['blog_links'],
profile= items_for_graph['profile_links'],
)
)
Next, specify the function you want to use in the graph and specify the graph size. In Tools, specify the function you want to use in the graph. See the Bokeh Manual (http://bokeh.pydata.org/en/0.10.0/docs/user_guide/tools.html) for more information.
from bokeh.io import output_file, show, vform, vplot
from bokeh.models import HoverTool, OpenURL, TapTool
# Import bokeh, sub modules for making scallter graph with tooltip
from bokeh.models.widgets import DataTable, DateFormatter, TableColumn
from bokeh.models import ColumnDataSource, OpenURL, TapTool
from bokeh.plotting import figure, output_file, show
GRAPH_HEIGHT = 1000
GRAPH_WIDTH = 800
TOOLS = [WheelZoomTool(), PanTool(), ResetTool(), TapTool()]
Then create a graph object. This time, I specified the circle method to make a scatter plot. If you change this, you can draw a line to display the time series. Again, take a look at the Bokeh Manual.
s1 = figure(plot_width=GRAPH_WIDTH, plot_height=GRAPH_HEIGHT, tools=TOOLS, title=graph_title_name)
s1.circle('x', 'y', size=10, source=source)
Finally, create a graph html with show (s1)
.
In Bokeh, you can actually write html tags yourself to create tooltips.
Therefore, you can write an html tag that displays an image and use it as a tooltip.
For more information here
You can easily embed the URL by using the ʻOpenURL` method.
Click to open the URL in the table.
For more information here
It is not possible to display all the information in the graph. Therefore, I will put detailed information in the table.
And a scatter plot on the top and a graph on the bottom. You can display it like this.
See this part for details.
That's why the graph I created is here. This graph is the result of making it two-dimensional with RBM embeddings + PCA.
See also tSNE.
By comparison, I feel that PCA has more similar faces plotted closer.
I feel that PCA is better able to compress the dimensions of Embeddings to 2 dimensions.
That's why, this time, I used Deep Learning to create something to support beauty pageants. We hope that you will consider it as an example of using Deep Learning.
Also, all the code is published on github. Please use all means. Also, we are looking for people who will develop interesting things together. If you have any interesting ideas, please feel free to contact us.