Basic understanding of stereo depth estimation (Deep Learning)

I've summarized the flow of Depth (Disparity) estimation by Stereo Camera.

See the previous article for general stereo matching https://qiita.com/minh33/items/55717aa1ace9d7f7e7dd

Depth Estimating process 1. Feature Extraction => Extract features using the same network and the same weight on the left and right (WxHxC). ex) intensity, shape, ..... etc 2. Cost Volume => Create Disparity Channel by shifting Pixel (DxWxHxC). D is an arbitrary value, and you decide the maximum Disparity (pixel) that you can take. If it is too large, it will be heavy, and if it is too small, close objects will not be matched. 3. 3D Feature Matching => Learn to get a large value where the right and left features are close by convolving (DxWxHxC) with the obtained features (DxWxHx1) 4. Disparity Regression => Convert from (DxWxH) to (1xWxH) to find the final Disparity 5. Calculate Loss => Calculate Loss by using LiDAR for Ground Truth or by warping the left and right images. Recently, it is often seen that learning is done only with images.

image.png

Computing the Stereo Matching Cost with a Convolutional Neural Network(2015)

Improve accuracy by substituting multi-channel features by convolving the features of the right and left images instead of intensity image.png

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition(2015)

You have to refer to a wider range of pixels to match nearby objects. The solution is to combine the fine and coarse resolutions of the feature map. image.png

End-to-End Learning of Geometry and Context for Deep Stereo Regression(2017)

As before, the right and left images are convoluted using the same weight to generate a feature map (WxHxC). Create a feature map (DxWxHxC) by shifting the pixels of the left image horizontally from 0 to maxDisparity (arbitrary) with respect to the right image. Simply shift the pixel horizontally (width direction). By performing 3D Convolution and 3D Deconvolution at 1/2, 1/4, 1/8, 1/16, 1/32, you can learn rough features and detailed features. The output here is (DxHxW). The final Disparity is output by multiplying the one-dimensional matching value by Disparity and taking the weighted average. With softArgMin, it is possible to calculate Disparity with sub-pixel accuracy.

image.png

image.png

Self-Supervised Learning for Stereo Matching with Self-Improving Ability(2017)

Until now, I have obtained Disparity or Depth from LiDAR to find Loss. The density of LiDAR is coarser than that of the image, and since training is performed even in a system that does not use LiDAR, the image on the left is simulated by shifting the pixel by the estimated Disparity on the right. Loss can be defined by looking at the generated left image and the original left image by looking at SAD (intensity or RGB difference) or SSIM (structural similarity). If the Disparity can be estimated correctly, the warped image will be almost the same as the opposite image.

image.png

Recommended Posts

Basic understanding of stereo depth estimation (Deep Learning)
Basic understanding of depth estimation by mono camera (Deep Learning)
Deep learning 1 Practice of deep learning
Techniques for understanding the basis of deep learning decisions
Deep running 2 Tuning of deep learning
Deep reinforcement learning 2 Implementation of reinforcement learning
Meaning of deep learning models and parameters
Try deep learning of genomics with Kipoi
Visualize the effects of deep learning / regularization
Deep Understanding Object Detection by Deep Learning by Keras
Sentiment analysis of tweets with deep learning
Deep Learning
Learning record of reading "Deep Learning from scratch"
Othello-From the tic-tac-toe of "Implementation Deep Learning" (2)
The story of doing deep learning with TPU
Deep learning / error back propagation of sigmoid function
[Learning memo] Deep Learning from scratch ~ Implementation of Dropout ~
Parallel learning of deep learning by Keras and Kubernetes
Reasonable price estimation of Mercari by machine learning
Implementation of Deep Learning model for image recognition
Deep learning learned by implementation (segmentation) ~ Implementation of SegNet ~
Deep Learning Memorandum
Start Deep learning
Python Deep Learning
Deep learning × Python
Count the number of parameters in the deep learning model
Application of Deep Learning 2 made from scratch Spam filter
Othello ~ From the tic-tac-toe of "Implementation Deep Learning" (4) [End]
DNN (Deep Learning) Library: Comparison of chainer and TensorFlow (1)
Summary of the basic flow of machine learning with Python
Cats are already tired of loose fluffy deep learning
Collection and automation of erotic images using deep learning
DEEP PROBABILISTIC PROGRAMMING --- "Deep Learning + Bayes" Library --- Introduction of Edward