Introduction

Last time, I created a face classifier using a convolutional neural network (CNN), but in reality this is just the beginning of machine learning. I learned that the world is progressing at a tremendous speed. I learned that the boss who killed his life is actually the weakest of the 10,000 or more enemies.

It seems that new algorithms are published every year and competitions are held to compete for recognition accuracy, and the main reason for the progress is to improve recognition accuracy and processing speed. Among them, I tried to move the tutorial of Single Shot MultiBox Detector (SSD) which is the latest algorithm this time.

If you want to read the detailed progress in the industry, the following page is very easy to understand. SSD: Single Shot MultiBox Detector (ECCV2016) In summary CNN → R-CNN → FAST R-CNN → FASTER R-CNN → SSD (here and now)

How it took time to post this time

I was addicted to it for about 5 days, and there was a problem that I could not find a clue to solve the result. That's because the implementation itself uses Python, but the fact is that there are so many library choices needed to implement it.

At first, I thought that it was implemented using Tesnorflow, but there is also code that seems to be too incomprehensible, and it is not interesting just for the purpose of moving it involuntarily, so the code is still minimal Keras. I decided to move what I implemented.

Reference code Kuras had no problem, but this is a mechanism that directly recognizes the video and required OpenCV, FFmpeg, GTK2. From the conclusion, although OpenCV and FFmpeg were installed using Homebrew, GTK2 was not built in OpenCV and the video was not loaded.

I tried brew edit and rewritten various things, and tried various things such as build options and things like this, but in conclusion Homebrew's OpenCV seemed to be set so that GTK could not be built. Article that led to the conclusion of giving up

By the way, I tried from the build, but I gave up on the way because the build of FFmpeg was very troublesome as before.

The result of trial and error

Based on the above, I decided to try the one using Chainer. This can detect still images, not videos. chainer-SSD

environment

Git

brew install git

Python3.6.1

brew install python3

PATH setting

if [ -d $(brew --prefix)/lib/python3.6/site-packages ];then
  export PYTHONPATH=$(brew --prefix)/lib/python3.6/site-packages:$PYTHONPAT
fi

Cython

pip3 install cython

Numpy

pip3 install numpy

Chainer

pip3 install chainer

Matplotlib

pip3 install matplotlib

git clone

cd {Workspace}
git clone https://github.com/ninhydrin/chainer-SSD.git

Preparation

cd {Workspace}/chainer-SSD/util
python3 setup.py build_ext -i

A warning will appear, but if it is finished, it should work without problems.

Run

Two images for execution are included, so let's execute this first.

cd {Workspace}/chainer-SSD
python3 demo.py img/dog.jpg
/usr/local/lib/python3.6/site-packages/skimage/transform/_warps.py:84: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
  warn("The default mode, 'constant', will be changed to 'reflect' in "

An error will occur, but if it is finished, it should work without problems.

result

Dog, car and bicycle

Fish-shaped bicycle and person

I tried various things

Car and cat

People and birds

raccoon

* Not recognizing (?)

Wheelchair dog

Many people

Impressions

――There was a feeling that the detection accuracy of about 3 objects per image was limited. --If the object in the image is too small, it will not be detected. (I think it's because I'm resizing) ――It seems that the accuracy of the back, sideways, and blurred objects is not very high yet. (Maybe there is not enough learning) ――I didn't feel that the speed was so slow for one image, so it doesn't change. (Experience about 5 seconds)

Summary

――I would like to read the source code, learn the original image data, and try again. ――I want to try recognition directly from the video or camera. (If there is an easy way to build on Mac or CentOS) --There is too little information translated into Japanese or Japanese. I want a community where teachers or each other can teach.

I tried running an object detection tutorial using the latest deep learning algorithm