Introduction

I thought of it, so I tried to classify the images at home. By the way, I was curious, so I will write an article comparing the accuracy of Fine tuning with Mobilenet v2.

Home PC environment 　iMac (Retina 4K, 21.5-inch, 2017) 3.6 GHz quad core Intel Core i7 　16 GB 2400 MHz DDR4

What a mess for busy people

・ I tried various Fine Tuning with Mobilenet v2. ・ It's a task of 3 categories of dog / cat / bird. ・ I collected images on Flickr ・ No Fine Tuning / 16,15,14,13,12,11,10,9,8,7,6,5,4,3,2, Relearn the first and subsequent layers / Compare accuracy by relearning all I saw it ・ This time, the accuracy was good with the same correct answer rate (94.4%) for learning after the 14th layer and learning after the 11th layer.

Task settings and image collection

It seems that anything can be done with binary classification, so it seems that anything can be done, so it is good to get results in an easy-to-understand manner, so I would like to try animal classification, so the popular classification of "dog" and "cat" is "bird" I decided to try the ternary classification with "" added.

If so, it is a collection of images. I wanted to collect it from the web, but when I was looking at various things, it seemed good to collect it on flickr, so I decided to collect it on flickr. I referred to this article. How to scrape image data from flickr with python

You can download up to 500 115 * 115 size images at once. (When I greedily tried to download 1000, I could only download 500) We collected 500 each with "dog", "cat" and "bird" as arguments. I decided to use 450 images each, removing images with too small subjects and images containing humans and other animals. 30 of them were assigned to test and validation, and the remaining 390 were used for train. The image looks like this. The PC is covered with cute dogs. スクリーンショット 2020-10-11 11.22.20.png

Sort the images into folders with the following structure.

├── data │ ├── test │ │ ├── bird │ │ ├── cat │ │ └── dog │ ├── train │ │ ├── bird │ │ ├── cat │ │ └── dog │ └── val │ ├── bird │ ├── cat │ └── dog

Learning and Fine Tuning

Click here for the code https://github.com/kiii142/mobilenetv2_keras

For the setting of the layer of FineTuning, I referred to this page that I always refer to. Transfer learning / fine tuning with TensorFlow, Keras (example of image classification)

Learning without fineTuning started with a learning rate of 0.001 and was trained by 50 epoch. The fine-tuned learning started with a uniform learning rate of 0.0001 and was trained by 30 epoch. The batch size is uniformly 8 and the image is resized to 96 * 96 (because some trained weights are 96 * 96). This time, Optimizer uses RMSprop following the paper. (I used SGD at first, but RMSprop is more accurate than that. I haven't done it in detail this time, but it may be interesting to compare this area.)

Below are the results. The confusion matrix of the learning curve and the result in the test image is shown. I'm ashamed to say that I forgot to write what the axis of the learning curve is, but the vertical axis is the value of (correct answer rate and loss), and the horizontal axis is the number of epochs.

No Fine Tuning

The weight of Mobilenet v2 is trained with None. I can learn with a good feeling, but the correct answer rate for both train and val is around 60%. I want you to get a little more accuracy The correct answer rate in the test image is 68.9%

Relearn 16th layer and beyond

The weight of Mobilenet v2 is set to imagenet, and the 16th and subsequent layers are relearned. The correct answer rate of val is very high. The correct answer rate in the test image is 93.3%

Relearn the 15th and subsequent layers

The weight of Mobilenet v2 is set to imagenet, and the 15th and subsequent layers are relearned. Again, the correct answer rate for val is extremely high. The correct answer rate in the test image is 90.0%

Relearn 14th layer and beyond

The weight of Mobilenet v2 is set to imagenet, and the 14th and subsequent layers are relearned. The correct answer rate in the test image is 94.4%. feel well.

Relearn 13th layer and beyond

The weight of Mobilenet v2 is set to imagenet, and the 13th and subsequent layers are relearned. The correct answer rate in the test image is 86.7%. Did it go down a little?

Relearn the 12th and subsequent layers

The weight of Mobilenet v2 is set to imagenet, and the 12th and subsequent layers are relearned. The correct answer rate in the test image is 86.6%

Relearn the 11th and subsequent layers

The weight of Mobilenet v2 is set to imagenet, and the 11th and subsequent layers are relearned. The correct answer rate for the test image is 94.4%. I came here and went up again

Relearn the 10th and subsequent layers

The weight of Mobilenet v2 is set to imagenet, and the 10th and subsequent layers are relearned. The correct answer rate in the test image is 88.9%

Relearn 9th and subsequent layers

The weight of Mobilenet v2 is set to imagenet, and the 9th and subsequent layers are relearned. The correct answer rate in the test image is 92.2%

Relearn the 8th and subsequent layers

The weight of Mobilenet v2 is set to imagenet, and the 8th and subsequent layers are relearned. The correct answer rate in the test image is 92.2%

Relearn the 7th and subsequent layers

The weight of Mobilenet v2 is set to imagenet, and the 7th and subsequent layers are relearned. The correct answer rate in the test image is 85.6%

Relearn the 6th and subsequent layers

The weight of Mobilenet v2 is set to imagenet, and the 6th and subsequent layers are relearned. The correct answer rate in the test image is 87.8%

Relearn the 5th and subsequent layers

The weight of Mobilenet v2 is set to imagenet, and the 5th and subsequent layers are relearned. The correct answer rate in the test image is 90.0%

Relearn the 4th and subsequent layers

The weight of Mobilenet v2 is set to imagenet, and the 4th and subsequent layers are relearned. The correct answer rate in the test image is 86.7%

Re-learn the third and subsequent layers

The weight of Mobilenet v2 is set to imagenet, and the third and subsequent layers are relearned. The correct answer rate in the test image is 88.9%

Relearn the second and subsequent layers

The weight of Mobilenet v2 is set to imagenet, and the second and subsequent layers are relearned. The correct answer rate in the test image is 83.3%

Relearn the first and subsequent layers

The weight of Mobilenet v2 is set to imagenet, and the first and subsequent layers are relearned. The correct answer rate in the test image is 90.0%

Relearn everything

This is the result of learning without freezing any layer by setting the weight of Mobilenetv2 to imagenet. The correct answer rate in the test image is 91.1%

Summary

Looking only at the correct answer rate this time, the result was that re-learning after the 14th layer and re-learning after the 11th layer should be tied (94.4%). Fine Tuning is the one that somehow relearns only the last layer and the accuracy comes out, and I thought that if I relearned to the previous layer, the accuracy would worsen, but it was not so bad. The impression is that. In the first place, just learning using the weight of imagenet will improve the accuracy considerably, so I wonder if there is not much difference.

At the end

To be honest, it's really annoying to post all the results ... However, somehow, from now on, I have an index that I should try Fine Tuning for re-learning from the 14th layer onward first. In that sense, I'm glad I did it. I'd like to implement this as a smartphone app if I have some spare time on my days off, so I'll continue to do it.

Various Fine Tuning with Mobilenet v2