[('Russian_Blue', 0.58100362140429651),
('British_Shorthair', 0.22552991563514049),
('Abyssinian', 0.057159848358045016),
('Bombay', 0.043851502320485049),
('Egyptian_Mau', 0.030686072815385441)])]
Cat detection with OpenCV detected cat faces, but this time I will use deep learning technology to identify cat breeds.
If you are interested, the technical details are written on the blog.
Here, a technique called ** Deep Convolutional Neural Network (DCNN) ** is applied to general object recognition to identify cat breeds. The problem in this area is called ** Fine-Grained Visual Categorization (FGVC) **, which narrows down the target domain (this time the cat breed) to classify. It is difficult to achieve high accuracy because we are dealing with things that are visually similar.
There are several DCNN implementations, but here we use a library called Caffe (* The library itself is an open source BSD 2-Clause license, but the ImageNet data is Note that it is non-commercial). The output of the intermediate layer (hidden layer) of DCNN is extracted as a 4096-dimensional feature quantity, and an appropriate classifier is created using it as a feature to make a prediction. I think it would be easier to use the scikit-learn implementation for the classifier.
The source code is posted on GitHub, so please refer to it if you are interested. The following processing is implemented. (It's a scribbled command line tool, not a library.)
:octocat: cat-fancier/classifier at master · wellflat/cat-fancier
Let's benchmark with a dataset of animal images published by the University of Oxford.
Since it is 12 classes, it will be a light task. This time, we will use 1800 for learning and 600 for verification. It seems that the number of learning images is 150 per class, which seems to be small, but if there are about 12 classes, this number can provide reasonable accuracy. Since the number of data is small, learning will be completed in about tens of minutes even if you do a grid search on a cheap VPS. Here, only the classification result by SVM-RBF is listed.
## SVM RBF Kernel
SVC(C=7.7426368268112693, cache_size=200, class_weight=None, coef0=0.0,
degree=3, gamma=7.7426368268112782e-05, kernel='rbf', max_iter=-1,
probability=False, random_state=None, shrinking=True, tol=0.001,
verbose=False)
precision recall f1-score support
Abyssinian 0.84 0.91 0.88 47
Bengal 0.84 0.83 0.84 46
Birman 0.72 0.79 0.75 52
Bombay 0.98 0.98 0.98 46
British_Shorthair 0.82 0.75 0.78 53
Egyptian_Mau 0.87 0.87 0.87 61
Maine_Coon 0.87 0.89 0.88 45
Persian 0.85 0.91 0.88 45
Ragdoll 0.76 0.76 0.76 41
Russian_Blue 0.84 0.82 0.83 57
Siamese 0.81 0.69 0.75 55
Sphynx 0.94 0.96 0.95 52
avg / total 0.85 0.84 0.84 600
In the case of SVM-RBF, the accuracy was 84.5%. The accuracy of some long-haired species such as ragdolls is low, but I think it's OK if the accuracy is as high as 1800 learning data. The blog also posts the results of other classifiers, but I think it is more realistic to use linear SVM or logistic regression for large-scale data due to the problem of prediction speed.
It should be noted that the neural network automatically finds (learns) the features that are effective for recognition without using the hand-crafted features. This time, DCNN was used as a feature extractor, but Fine-tuning is used to fine-tune the entire network using other teacher data, using the parameters of the model learned based on large-scale teacher data such as ImageNet as initial values. If you use a model created using a technique called (fine tuning), you may be able to classify with higher accuracy. I tried various things at hand, but this task did not show a significant improvement in accuracy for the time (and memory usage) required to create the model. I don't think there is any difficulty in the Fine-tuning procedure if you follow the tutorial on the Caffe official website.
Deep CNN has come to see its name frequently in well-known competitions such as ILSVRC. In the future, I think that the number of cases where deep learning is used at the product level such as Web services and applications will increase steadily. Once a practical level method is established, money will be spent on how to collect data.
[('Abyssinian', 0.621), ('Bengal', 0.144), ('Sphynx', 0.087)]
Abyssinian probability 62.1%, Bengal probability 14.4%, Sphinx probability 8.7%
Recommended Posts