I may not be able to explain it well, but I would be grateful if you could roughly work out the nuances. I also summarized the stereo depth, so if you are interested, please https://qiita.com/minh33/items/55717aa1ace9d7f7e7dd https://qiita.com/minh33/items/6b8d37ce08f85d3a3479
For monocular </ h3>
By moving the car with the camera on it, you can convert it to the previous frame. First, the distance of the image when t = t is estimated by the network. Since the distance can be estimated, it is possible to calculate a 3D point cloud. Use self-position estimation to find the amount of movement of the car with the camera. VSLAM, odometry, GPS, IMU, etc. can be used for self-position estimation. By transfoming the 3D point cloud for which the amount of change in one frame of x, y, z, roll, pitch, yaw was calculated earlier, we were able to infer the 3D point cloud with t = t-1. By converting it to Image View, you can warp the image of t = t to the image of t = t-1. However, there are also disadvantages that you can not learn unless you move, and if the opponent's object is moving, it will shift even if you warp.
It=>target Image(t=t) Is=>source Image(t=t-1) Dt => target Depth (Ground Truth of distance using LiDAR) D^t=>Estimated target Depth I^t=>Estimated target Image View Synthesis => Image Reconstruction Photometric Loss => Comparison of estimated image and actual image
It is used to calculate Loss in No. 3, but you can convert the obtained Depth to Disparity and warp the image on the right to the image on the left. By the way, is it binocular even though it is mono depth? I think some people think that, distance estimation is done with a single eye, and the opposite lens is used as the ground truth for learning.
This paper is probably the most famous in monodepth
・ Reconstruction Loss A Reconstruction image on the left can be created by warping using the Disparity on the left that estimates the image on the right. Calculate the SAD and SSIM of that image and the input image on the left. Do the reverse
・ LR Consistency Loss Warp the Disparity Map on the right to the Disparity Map on the left to calculate the difference between the absolute values of Disparity. Do the reverse
・ Smoothness Loss Since the Disparity (Depth) of nearby Pixels should be almost the same if they are the same object, the Smoothness of Disparity is calculated using Laplacian Smoothness or the like. Perform on the right and left Disparity Map respectively.
Recommended Posts