About the main tasks of image processing (computer vision) and the architecture used

Purpose of this post

Create a guide for choosing an implementation method when solving problems related to image processing.

Because it is a beginner, I would appreciate it if you could point out any excesses or deficiencies or mistakes in the description.

Problem-solving flow

item	Contents
Task definition	Define which task the problem to be solved will be treated as
Architecture decision	Determine the main architecture from the defined tasks
Determination of evaluation index	Determine the appropriate evaluation index for the problem

Key tasks of image processing

When the problem you want to solve is image recognition, define which task it is according to your requirements

Image classification
Object detection
Semantic segmentation
Anomaly detection

Famous architecture for each task

Additional features and usage for each architecture will be added in the future.

Image classification

AlexNet
VGG16
ResNet

Object detection

Yolo-v2, yolo-v3
SSD

Semantic segmentation

U-Net
SegNet
PSPNet
GCN
DeepLabv3+

Anomaly detection

Model based on auto-encoder (Standard model is not fixed because there is no specific task)

Reference: https://www.youtube.com/watch?v=vFpZrxaq5xU

Evaluation index for each task

Maintenance in the future except for semantic segmentation

Semantic segmentation

Pixel Wise Accuracy
Mean Accuracy
Mean Intersection over Union(Mean IoU)
Precision, Recall, F1 score

Recommended Posts

About the main tasks of image processing (computer vision) and the architecture used

About the behavior of copy, deepcopy and numpy.copy

About the processing speed of SVM (SVC) of scikit-learn

Image processing? The story of starting Python for

About the behavior of Queue during parallel processing

About the * (asterisk) argument of python (and itertools.starmap)

Think about the next generation of Rack and WSGI

Personal notes about the integration of vscode and anaconda

100 language processing knock-42: Display of the phrase of the person concerned and the person concerned

100 language processing knock-29: Get the URL of the national flag image

The image display function of iTerm is convenient for image processing.

About the ease of Python

100 image processing knocks !! (001 --010) Carefully and carefully

About the components of Luigi

About the features of Python

Image expansion and contraction processing

Understand the function of convolution using image processing as an example

Display the image of the camera connected to the personal computer on the GUI.

Flow of getting the result of asynchronous processing using Django and Celery

Read the image of the puzzle game and output the sequence of each block

Consider the speed of processing to shift the image buffer with numpy.ndarray

Verify the compression rate and time of PIXZ used in practice