A light introduction to object detection

This article is the 15th day article of Kinki University Advent Calendar 2019. In this article, I will briefly write about object detection. I wrote it with reference to the contents of the survey paper "Object Detection in 20 Years: A Survey".

1. What is Object Detection?

Object detection is the task of detecting an instance of a class or semantic object in an image or video. In the past, it was a difficult task to achieve accuracy because there are two types of guidelines, classify and semantic. In recent years, as with other CV field tasks, the power of deep learning has continued to improve accuracy at a dizzying rate.

1.1 What is the latest object detection method?

You can investigate SOTA (State-of-the-Art) methods at Object Detection on COCO test-dev.

スクリーンショット 2019-11-13 16.48.32.png

At the time of writing (November 13, 2019), SOTA was Cascade Mask R-CNN (Triple-) using CBNet: A Novel Composite Backbone Network Architecture for Object Detection. ResNeXt152, multi-scale) ”has mAP 53.3 (COCO).

Due to fierce competition in the CV field, it will be updated immediately in 3 months to 6 months. M2Det, which was SOTA around January 2019, is mAP 44.2 (COCO), so it has increased by nearly 10 points in one year. To learn about the latest object detection using deep learning, you can refer to the paper summary by hoya012 (https://github.com/hoya012/deep_learning_object_detection).

2. History of object detection

The history of object detection is roughly divided into two terms. "Before the invasion of deep learning" from 2001 to 2013 and "After the invasion of deep learning" from 2014. Neural networks have been booming in recent years due to improvements in machine specifications and the use of GPUs in recent years, but object detection is also evolving in conjunction with them.

image.png The figure is taken from Object Detection in 20 Years: A Survey.

Before the invasion of deep learning, object detection was performed by considering the process of extracting features by looking at the numerical values of the image, but after the invasion, the configuration and mechanism of the neural network are considered and adjusted. (Of course, it is important to learn the process of extracting features)

In SNS, human resources who have been called feature extraction entertainers have been required. In the deep world, human resources called hyperparameter adjustment entertainers are needed [citation needed]

The main technologies of each term are briefly explained below.

2.1 Before the deep attack

Object detection before the deep invasion has been performed by sliding window detection, which extracts features and moves the window showing the area to make a judgment. The following is a typical method.

VJ Det(2000)

It is a real-time detector for human face called Viola Jones Detector. This is a type of detector that extracts Haar-like features focusing on the difference in brightness and performs cascade classification using a sliding window. Haar-like features are simply the sum of pixels in a certain area. "Face detection using Harr Cascades ”, The face detection method is described, and the sample code for cascade detection with OpenCV is written.

It is an operation that calculates the addition value of pixels and checks whether the patterns match.

image.png

You can download the learning result (xml) of the detector here. https://github.com/opencv/opencv/tree/master/data/haarcascades

If you don't build the OpenCV installation, it looks like this.

pip install opencv-python opencv-contrib-python

VJDetector itself can be used like this.

import cv2
img = cv2.imread("input.jpg ")
detector = cv2.CascadeClassifier("haarcascade_frontalface_default.xml")
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
face = detector.detectMultiScale(gray_img,scaleFactor=1.1, minNeighbors=3, minSize=(30, 30))
for (x, y, w, h) in face:
  cv2.rectangle(img, (x, y), (x+w, y+h), (0, 0, 300), 4)
cv2.imwrite("output.jpg ", img)

You can easily use it just by throwing the xml file to cv2.CascadeClassifier.

This is an example that I actually used. image.png You recognize the president's face and apply a mosaic! (I'm worried that this will be an international issue)

HOG Det(2006)

It is a detector that extracts HOG features focusing on the distribution of brightness in the gradient direction and classifies them by SVM while performing a sliding window. For human detection, it is said that HOG features that can capture contour information are better than Haar features that differ in brightness. In OpenCV, it is implemented as cv2.HOGDescriptor. The code looks like this.

import cv2
img = cv2.imread("input.jpg ")
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
hogParams = {'winStride': (8, 8), 'padding': (32, 32), 'scale': 1.05}
human, r = hog.detectMultiScale(img, **hogParams)
for (x, y, w, h) in human:
  cv2.rectangle(img, (x, y), (x+w, y+h), (0, 0, 300), 4)
cv2.imshow("results human detect by def detector", img)
cv2.waitKey(0)

2.2 After the deep attack

After the arrival of Deep, the field of object detection has dramatically improved in accuracy. There are various technical terms such as the appearance of CNN, the appearance of VGG, and the appearance of ResNet. These technologies are basically tasks derived from the image classification task, and the task of object detection tends to improve based on them. The methods are divided according to the handling of the two processes of classification and position estimation. A one-shot Detector that performs both at the same time and a two-shot Detector that performs classification after performing position estimation. スクリーンショット 2019-12-15 0.05.22.png In general, the One-shot Detector has excellent detection speed, and the Two-shot Detector has excellent detection accuracy. (Honestly, I feel that there is not much difference with the latest method)

One-shot Detector One-shot Detector is a detection method that performs image classification and position detection at the same time. In many cases, it can be classified into YOLO type and SSD type. Among the One-shot Detectors with excellent detection speed, SSD type is often faster, and it is often said that YOLO type is superior in detection accuracy.

Two-shot Detector

In the two-stage classification, the technology starting with RCNN is mainly used. In recent years, semantic segmentation, which classifies by pixel, is sometimes performed, and I get the impression that it often exceeds the One-shot Detector in terms of accuracy. (Of course, it's based on the labeled dataset that supports it.)

3. Deep library

Basically, there is a tendency to use GPU to gain processing power, so it is necessary to cooperate with CUDA for most libraries. (Of course, you can use only the CPU) To be honest, it is really a penance to match the version around that.

tensorflow

This is a library developed by Google. The version upgrade is progressing with great momentum, and it will be difficult to match the version with CUDA. Stop changing the API mercilessly (quiet)

pytorch

A library of images that are in battle with tensorflow. I have the impression that there are many pytorch in the reproduction implementation of recent Tsuyotsuyo papers. Especially popular with young people (selfish image)

chainer

This is a library that PFN has recently stopped updating. I like it a lot, but I think it will be a tough option for new users.

keras

It is a library that works with tensorflow etc. as the back end. The degree of abstraction is very high, and even beginners of deep learning can easily create a network of image classification using CNN. However, if you try to handle something more advanced than classification, you will have to write tensorflow code, and the definition of the loss function will be troublesome.

4. Tools that support the deep environment

Here are some keywords for tools that may help support a deep development environment. I personally recommend nvidia-docker + native python + pip + etc ..

Docker (kubernetes)

It is a great tool that makes it possible to save the environment by using a virtual container (vocabulary) From v19, it supports GPU natively. Before that, you can create a container that uses GPU by installing nvidia-docker. In particular, it has the advantages of easy version matching with CUDA and easy reproduction of the library environment. The disadvantage is the learning cost of Docker ...

Anaconda

A python version and library management tool used for scientific and technological calculations. It pollutes the terminal and tends to be dismissed by chance for religious reasons.

pyenv

There are various Python library management tools, divided into denominations (I don't quite understand the difference between virtualenv and pyenv).

Five. At the end

All the tasks in the CV field are in fierce competition, so they are evolving at a tremendous speed. I do not recommend it because it is a hot industry to enter as basic research, but I think that it is a task that can still be challenged as an application using these technologies. Especially when the paper is reproduced and implemented, it is very educational because it is necessary to write the processing that is not written in the paper, the processing time that is written in the paper, and the processing that is not in the library in the first place. I will. Recommended when you have time. Problems such as resources and processing time that have existed so far are being solved, and it is becoming very easy to handle, so why not give it a try?

I'm exhausted on the way, so I may add it in the future.

References

Recommended Posts

A light introduction to object detection
An Introduction to Object-Oriented-Give an object a child.
A quick introduction to pytest-mock
A super introduction to Linux
Introduction to Anomaly Detection 1 Basics
A super introduction to Python bit operations
Introduction to MQTT (Introduction)
Introduction to Scrapy (1)
Introduction to Scrapy (3)
Introduction to Supervisor
Introduction to Tkinter 1: Introduction
How to make a model for object detection using YOLO in 3 hours
Introduction to PyQt
Introduction to Scrapy (2)
[Linux] Introduction to Linux
Introduction to Scrapy (4)
Introduction to discord.py (2)
Introduction to discord.py
How to create a function object from a string
How to generate a Python object from JSON
[Introduction to Python3 Day 23] Chapter 12 Become a Paisonista (12.1 to 12.6)
Save the object to a file with pickle
A quick introduction to the neural machine translation library
Introduction to Lightning pytorch
Introduction to Web Scraping
Introduction to Nonparametric Bayes
Introduction to EV3 / MicroPython
Introduction to Python language
Introduction to TensorFlow-Image Recognition
Introduction to OpenCV (python)-(2)
A simple system that automatically shoots with object detection and sends it to LINE
Introduction to PyQt4 Part 1
Introduction to Linear Algebra in Python: A = LU Decomposition
Introduction to Dependency Injection
Introduction to Private Chainer
Introduction to machine learning
An introduction to machine learning from a simple perceptron
Anomaly detection introduction 2 Outlier detection
Application of affine transformation by tensor-from basic to object detection-
Day 68 [Introduction to Kaggle] Random Forest was a simple one.
An introduction to object orientation-let's change the internal state of an object
[Introduction to AWS] A memorandum of building a web server on AWS
Image analysis with Object Detection API to try in 1 hour
[Introduction to Tensorflow] Understand Tensorflow properly and try to make a model
AOJ Introduction to Programming Topic # 1, Topic # 2, Topic # 3, Topic # 4
Introduction to electronic paper modules
Introduction to dictionary lookup algorithm
Introduction to Monte Carlo Method
[Learning memorandum] Introduction to vim
Introduction to PyTorch (1) Automatic differentiation
opencv-python Introduction to image processing
Introduction to Cython Writing [Notes]
An introduction to private TensorFlow
A road to intermediate Python
An introduction to machine learning
[Introduction to cx_Oracle] Overview of cx_Oracle
AOJ Introduction to Programming Topic # 7, Topic # 8
[Introduction to pytorch-lightning] First Lit ♬
How to call a function
Upload a file to Dropbox
Send a signal to subprocess