Load the caffe model with Chainer and classify the images. The Chainer sample also has image classification, but I can't tell which image was classified into which category just by outputting the recognition rate. Allows you to output the category name and score as the classification result. You can find the source code at here. (A classified version of the code in this article) If you find it difficult to read the article, please clone it.
This time, we will use bvlc_googlenet as the model. 1000 categories can be classified. There is a link to the caffemodel file on the bvlc_googlenet page, so download it from there.
A label file is generated so that the category number of the classification result and the category name can be associated. Below is a script to download imagenet related files. https://github.com/BVLC/caffe/blob/master/data/ilsvrc12/get_ilsvrc_aux.sh A label file is generated by processing the synset_words.txt included in caffe_ilsvrc12.tar.gz described in this.
synset_words.txt
n01440764 tench, Tinca tinca
n01443537 goldfish, Carassius auratus
n01484850 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
n01491361 tiger shark, Galeocerdo cuvieri
Execute the following command
wget http://dl.caffe.berkeleyvision.org/caffe_ilsvrc12.tar.gz
tar -xf caffe_ilsvrc12.tar.gz
sed -e 's/^[^ ]* //g' synset_words.txt > labels.txt
The label file is created.
labels.txt
tench, Tinca tinca
goldfish, Carassius auratus
great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
tiger shark, Galeocerdo cuvieri
hammerhead, hammerhead shark
Since there are two lines called "crane", it is confusing, so change the 135th line to "crane (bird)" and the 518th line to "crane (machine)".
Use Pillow to read the image, resize it, clip it and then convert it to a numpy array
import numpy as np
from PIL import Image
#Definition of input image size
image_shape = (224, 224)
#Read image and convert to RGB format
image = Image.open('sample.png').convert('RGB')
#Image resizing and clipping
image_w, image_h = self.image_shape
w, h = image.size
if w > h:
shape = (image_w * w / h, image_h)
else:
shape = (image_w, image_h * h / w)
x = (shape[0] - image_w) / 2
y = (shape[1] - image_h) / 2
image = image.resize(shape)
image = image.crop((x, y, x + image_w, y + image_h))
pixels = np.asarray(image).astype(np.float32)
#pixels are 3D and each axis is[Y coordinate,X coordinate, RGB]Represents
#Input data is 4D[Image index, BGR,Y coordinate,X coordinate]So, do the array conversion
#Convert from RGB to BGR
pixels = pixels[:,:,::-1]
#Swap the axes
pixels = pixels.transpose(2,0,1)
#Draw average image
mean_image = np.ndarray((3, 224, 224), dtype=np.float32)
mean_image[0] = 103.939
mean_image[1] = 116.779
mean_image[2] = 123.68
pixels -= self.mean_image
#Make it 4D
pixels = pixels.reshape((1,) + pixels.shape)
Load the caffemodel and use the array you just generated as input data.
import chainer
import chainer.functions as F
from chainer.functions import caffe
#Load caffe model
func = caffe.CaffeFunction('bvlc_googlenet.caffemodel')
#layer'loss3/classifier'Get the output of and apply softmax
x = chainer.Variable(pixels, volatile=True)
y, = func(inputs={'data': x}, outputs=['loss3/classifier'], disable=['loss1/ave_pool', 'loss2/ave_pool'], train=False)
prediction = F.softmax(y)
The classification result is output.
#Read label
categories = np.loadtxt('labels.txt', str, delimiter="\n")
#Scores and labels are linked and sorted in descending order of score
result = zip(prediction.data.reshape((prediction.data.size,)), categories)
result = sorted(result, reverse=True)
#View the top 10 results
for i, (score, label) in enumerate(result[:10]):
print '{:>3d} {:>6.2f}% {}'.format(i + 1, score * 100, label)
When I recognized the landscape image taken in Asakusa, it became as follows. The top category is now a mosque. I would like you to recognize skyscrapers and towers, but they do not seem to be in the category.
1 38.85% mosque
2 6.07% fire engine, fire truck
3 5.15% traffic light, traffic signal, stoplight
4 3.97% radio, wireless
5 3.25% cinema, movie theater, movie theatre, movie house, picture palace
6 2.14% pier
7 2.01% limousine, limo
8 1.92% stage
9 1.89% trolleybus, trolley coach, trackless trolley
10 1.61% crane (machine)
There are several trained caffe models available that anyone can use to classify images. This time, only one image was input, but it is possible to input multiple images at the same time. Since it takes time to load the caffemodel, it is better to load the image while keeping the caffemodel loaded.
Import Caffe model using Chainer and let it recognize images on Mac without CUDA
Recommended Posts