I thought of it, so I tried to classify the images at home. By the way, I was curious, so I will write an article comparing the accuracy of Fine tuning with Mobilenet v2.
・ I tried various Fine Tuning with Mobilenet v2. ・ It's a task of 3 categories of dog / cat / bird. ・ I collected images on Flickr ・ No Fine Tuning / 16,15,14,13,12,11,10,9,8,7,6,5,4,3,2, Relearn the first and subsequent layers / Compare accuracy by relearning all I saw it ・ This time, the accuracy was good with the same correct answer rate (94.4%) for learning after the 14th layer and learning after the 11th layer.
It seems that anything can be done with binary classification, so it seems that anything can be done, so it is good to get results in an easy-to-understand manner, so I would like to try animal classification, so the popular classification of "dog" and "cat" is "bird" I decided to try the ternary classification with "" added.
If so, it is a collection of images. I wanted to collect it from the web, but when I was looking at various things, it seemed good to collect it on flickr, so I decided to collect it on flickr. I referred to this article. How to scrape image data from flickr with python
You can download up to 500 115 * 115 size images at once. (When I greedily tried to download 1000, I could only download 500) We collected 500 each with "dog", "cat" and "bird" as arguments. I decided to use 450 images each, removing images with too small subjects and images containing humans and other animals. 30 of them were assigned to test and validation, and the remaining 390 were used for train. The image looks like this. The PC is covered with cute dogs.
Sort the images into folders with the following structure.
├── data │ ├── test │ │ ├── bird │ │ ├── cat │ │ └── dog │ ├── train │ │ ├── bird │ │ ├── cat │ │ └── dog │ └── val │ ├── bird │ ├── cat │ └── dog
Click here for the code https://github.com/kiii142/mobilenetv2_keras
For the setting of the layer of FineTuning, I referred to this page that I always refer to. Transfer learning / fine tuning with TensorFlow, Keras (example of image classification)
Learning without fineTuning started with a learning rate of 0.001 and was trained by 50 epoch. The fine-tuned learning started with a uniform learning rate of 0.0001 and was trained by 30 epoch. The batch size is uniformly 8 and the image is resized to 96 * 96 (because some trained weights are 96 * 96). This time, Optimizer uses RMSprop following the paper. (I used SGD at first, but RMSprop is more accurate than that. I haven't done it in detail this time, but it may be interesting to compare this area.)
Below are the results. The confusion matrix of the learning curve and the result in the test image is shown. I'm ashamed to say that I forgot to write what the axis of the learning curve is, but the vertical axis is the value of (correct answer rate and loss), and the horizontal axis is the number of epochs.
The weight of Mobilenet v2 is trained with None. I can learn with a good feeling, but the correct answer rate for both train and val is around 60%. I want you to get a little more accuracy The correct answer rate in the test image is 68.9%
The weight of Mobilenet v2 is set to imagenet, and the 16th and subsequent layers are relearned. The correct answer rate of val is very high. The correct answer rate in the test image is 93.3%
The weight of Mobilenet v2 is set to imagenet, and the 15th and subsequent layers are relearned. Again, the correct answer rate for val is extremely high. The correct answer rate in the test image is 90.0%
The weight of Mobilenet v2 is set to imagenet, and the 14th and subsequent layers are relearned. The correct answer rate in the test image is 94.4%. feel well.
The weight of Mobilenet v2 is set to imagenet, and the 13th and subsequent layers are relearned. The correct answer rate in the test image is 86.7%. Did it go down a little?
The weight of Mobilenet v2 is set to imagenet, and the 12th and subsequent layers are relearned. The correct answer rate in the test image is 86.6%
The weight of Mobilenet v2 is set to imagenet, and the 11th and subsequent layers are relearned. The correct answer rate for the test image is 94.4%. I came here and went up again
The weight of Mobilenet v2 is set to imagenet, and the 10th and subsequent layers are relearned. The correct answer rate in the test image is 88.9%
The weight of Mobilenet v2 is set to imagenet, and the 9th and subsequent layers are relearned. The correct answer rate in the test image is 92.2%
The weight of Mobilenet v2 is set to imagenet, and the 8th and subsequent layers are relearned. The correct answer rate in the test image is 92.2%
The weight of Mobilenet v2 is set to imagenet, and the 7th and subsequent layers are relearned. The correct answer rate in the test image is 85.6%
The weight of Mobilenet v2 is set to imagenet, and the 6th and subsequent layers are relearned. The correct answer rate in the test image is 87.8%
The weight of Mobilenet v2 is set to imagenet, and the 5th and subsequent layers are relearned. The correct answer rate in the test image is 90.0%
The weight of Mobilenet v2 is set to imagenet, and the 4th and subsequent layers are relearned. The correct answer rate in the test image is 86.7%
The weight of Mobilenet v2 is set to imagenet, and the third and subsequent layers are relearned. The correct answer rate in the test image is 88.9%
The weight of Mobilenet v2 is set to imagenet, and the second and subsequent layers are relearned. The correct answer rate in the test image is 83.3%
The weight of Mobilenet v2 is set to imagenet, and the first and subsequent layers are relearned. The correct answer rate in the test image is 90.0%
This is the result of learning without freezing any layer by setting the weight of Mobilenetv2 to imagenet. The correct answer rate in the test image is 91.1%
Looking only at the correct answer rate this time, the result was that re-learning after the 14th layer and re-learning after the 11th layer should be tied (94.4%). Fine Tuning is the one that somehow relearns only the last layer and the accuracy comes out, and I thought that if I relearned to the previous layer, the accuracy would worsen, but it was not so bad. The impression is that. In the first place, just learning using the weight of imagenet will improve the accuracy considerably, so I wonder if there is not much difference.
To be honest, it's really annoying to post all the results ... However, somehow, from now on, I have an index that I should try Fine Tuning for re-learning from the 14th layer onward first. In that sense, I'm glad I did it. I'd like to implement this as a smartphone app if I have some spare time on my days off, so I'll continue to do it.
Recommended Posts