About the main tasks of image processing (computer vision) and the architecture used
Purpose of this post
Create a guide for choosing an implementation method when solving problems related to image processing.
- Because it is a beginner, I would appreciate it if you could point out any excesses or deficiencies or mistakes in the description.
Problem-solving flow
item |
Contents |
Task definition |
Define which task the problem to be solved will be treated as |
Architecture decision |
Determine the main architecture from the defined tasks |
Determination of evaluation index |
Determine the appropriate evaluation index for the problem |
Key tasks of image processing
When the problem you want to solve is image recognition, define which task it is according to your requirements
- Image classification
- Object detection
- Semantic segmentation
- Anomaly detection
Famous architecture for each task
- Additional features and usage for each architecture will be added in the future.
Image classification
Object detection
Semantic segmentation
- U-Net
- SegNet
- PSPNet
- GCN
- DeepLabv3+
Anomaly detection
- Model based on auto-encoder (Standard model is not fixed because there is no specific task)
Reference: https://www.youtube.com/watch?v=vFpZrxaq5xU
Evaluation index for each task
- Maintenance in the future except for semantic segmentation
Semantic segmentation
- Pixel Wise Accuracy
- Mean Accuracy
- Mean Intersection over Union(Mean IoU)
- Precision, Recall, F1 score