Part 1: http://qiita.com/kenmaz/items/4b60ea00b159b3e00100 Part 2: http://qiita.com/kenmaz/items/ef0a1308582fe0fc5ea1
This is a continuation of the story of a software engineer who is completely new to machine learning and deep learning, who created an app that uses a convolutional neural network to identify the faces of members of "Momoiro Clover Z."
Last time I was able to make a model with an accuracy of 85%, but since I started from an ignorant state, I could not get accuracy at all at the beginning. did. Make a note of the process of trial and error as it may be useful to someone.
--Implemented for the time being based on the MNIST expert sample -=> ** Accuracy 53 ~ 56% ** ――Issue 1: For example, it seemed that the face photo facing right => all "Reni Takagi", and the photo facing directly in front was Arin. --Issue 2: In the first place, only about 50 training data were prepared for each member.
--From each training data image, left / right inversion, zoom / reduction, and rotated versions were generated to inflate the training data. --Run crawler again, add training data, and prepare about 200 sheets for each member. -=> ** Accuracy 58% ** --If you look closely at the log, the percentage of correct answers to the test data was constant from the middle of the training.
--Some of the inflated training data was used unevenly --Randomly select training data to be sent in batch -=> ** Accuracy 65-70% **
――Until now, I managed to stop training with my MacBook Pro in a few hundred steps. ――However, if you look at the sample code etc., max_step = 100,000, so I think that learning is not enough in the first place. ――I decided to create an environment on ec2 and borrow various instances and spin it around. --Borrow $ 1 an hour c4.4xlarge and perform about 15,000 steps -=> ** Accuracy 73% ** ――Actually, even with 8000 steps, it was about 73%
Looking at the accuracy for each member, it was as follows.
member | accuracy |
---|---|
Reni | 77% |
Momota | 44% |
Shiori | 73% |
Arin | 58% |
Kyouka | 88% |
It wasn't that I didn't think that "Yeah, I really want to be recommended because I have a distinctive face ... There is a story that a beautiful man and a beautiful woman are closer to the average face ...", but I'm calm. When I think about it, I came up with the idea that the variation in the training data itself would be too large. (Thigh angle, and I'm convinced that Kyouka is relatively accurate. ・)
The training data I've used so far is a collection of the results of recreating it many times, so the programs that generate them are also slightly different, which seems to have led to variations in the training data.
We are also preparing to publish it as a web service at the same time, and we had to do something about the face recognition part, so stable face recognition logic so that there is no variation among each member. Current code explained in the first part as a result of making adjustments because it is necessary. Recreate all the training data with this guy and train again.
=> ** Accuracy 77% **
――I felt that stable training data was created, so I added a layer to the model here to make it deeper. That was the [current model code] explained last time (https://github.com/kenmaz/momo_mind/blob/master/deeplearning/mcz_model.py) -=> ** Accuracy 85% **
So, while making various stupid mistakes, I went through trial and error. .. ** I realized from the bottom of my heart that the quality of training data and repetitive training are important **.
By the way, in a case like this one where you need a high-performance machine in the spot, an environment like aws is really convenient. It's expensive though. Since I touched each instance variously during learning, the rough performance is like this.
When I try to make a very deep (16 layers) deep neural network and execute training ..
Instance type | Approximate performance |
---|---|
t2.micor | N/A (Cannot be executed due to lack of resources) |
MacBook Pro at hand | 3.4 examples/sec; 34.967 sec/batch) |
c4.4xlarge | 3.9 examples/sec; 31.142 sec/batch |
c4.8xlarge | 8.6 examples/sec; 14.025 sec/batch |
c4.4xlarge, $ 2 per hour, but fast. c4.4xlarge is about the same as MBP. However, all of them are the result of using the cpu build of TensorFlow, so I would like to try the GPU build in the future and see how fast it is.
Also, at first I was trying to do various things with my VPS, but with CentOS6, the combination of Glibc version and numpy is not good (CentOS6 only supports up to glibc v2.12, numpy supports Glibc 2.15 Request => CentOS7 is required), so it was annoying, so ec2 is convenient in that you can create an environment and destroy it. It's expensive though.
Trained models can now use tf.train.Saver
to output and load snapshots of repeatedly adjusted weights and bias values as a model file.
This time, the model file is output every time 1000 steps are executed. https://github.com/kenmaz/momo_mind/blob/master/deeplearning/mcz_main.py
If you want to create a web service using the learning results, you can write code that reads this model file, executes inference, and returns the results to the web side.
So, I prepared a code that, after passing the image file path, executes inference for the image and returns the classification result of the members. https://github.com/kenmaz/momo_mind/blob/master/deeplearning/mcz_eval.py
result = mcz_eval.execute([imgfile1], '.', ckpt_path)
You can execute code such as reading the model file in the path of ckpt_path and returning the result of inferring to imgfile1.
Now that mcz_eval.py is created, all you have to do is proceed with familiar web programming. However, I have never written a web application in Python, so after a lot of research, the combination of Flask + uWSGI + Nginx
seems to make sense. Finally, it is implemented quietly with reference to the following page.
It turned out to be a rather crude article, but for the time being, it was a story that somehow something like that was made through trial and error.
――I want to improve the accuracy (twango Thank you for your advice) ――I would like to visualize "what kind of feature did you react to and produced the inference result". For example, "there is an array of dimple-like pixels = Kanako Momota" and "the eyes tend to be separated = Kyouka". Actually, I tried it once in GW, but it didn't work. I want to try again. --I want to try GPU build --I want to play with other types of neural networks, such as recursive neural networks.
that's all!
Recommended Posts