Deep learning for 3D scene reconstruction and segmentation from stereo images

Vladimir V. Kniaz; Vladimir A. Knyaz; Evgeny V. Ippolitov; Mikhail M. Novikov; Lev Grodzitsky; Petr V. Moshkantsev

doi:10.1117/12.2592648

20 June 2021 Deep learning for 3D scene reconstruction and segmentation from stereo images

Vladimir V. Kniaz, Vladimir A. Knyaz, Evgeny V. Ippolitov, Mikhail M. Novikov, Lev Grodzitsky, Petr V. Moshkantsev

Proceedings Volume 11785, Multimodal Sensing and Artificial Intelligence: Technologies and Applications II; 117850I (2021) https://doi.org/10.1117/12.2592648
Event: SPIE Optical Metrology, 2021, Online Only

Abstract

imultaneous 3D scene reconstruction and semantic segmentation are required in many applications such as autonomous driving, robotics, and optical metrology. Classic 3D reconstruction methods usually perform such operations twofold. Firstly, a 3D scanner or laser scanner acquires a point cloud. Secondly, semantic segmentation of the point cloud is performed. Recently a new kind of 3D model representation was proposed that utilizes the trapezium-shaped voxels that are aligned with the camera’s frustum and pixels [1]. Frustum voxel models proved to be effective for monocular 3D scene reconstruction and segmentation from monocular images [2]. Still, many existing 3D scanning systems readily provide stereo cameras. The performance of frustum voxel model-based methods for stereo input remains an open question. This paper is focused on the evaluation of the 3D reconstruction quality of a volumetric neural network with a monocular and stereo input. We leverage an SSZ [2] volumetric neural network as a starting point for our research. We develop its modified version that we term Stereo-SSZ that receives a stereo pair as an input. We compare the performance of the original SSZ model and our Stereo-SSZ model on different real and synthetic 3D shape datasets. Specifically, we generate a stereo version of the SemanticVoxels [2] dataset and capture stereo pairs of multiple real objects using a structured light scanner. The results of our experiments are encouraging and demonstrate that the model with a stereo input outperforms the original monocular SSZ network. Specifically, the frustum voxel models generated by our Stereo-SSZ model have lower surface distance errors and demonstrate fine details in the reconstructed 3D models.

Conference Presentation

Citation Download Citation

Vladimir V. Kniaz, Vladimir A. Knyaz, Evgeny V. Ippolitov, Mikhail M. Novikov, Lev Grodzitsky, and Petr V. Moshkantsev "Deep learning for 3D scene reconstruction and segmentation from stereo images", Proc. SPIE 11785, Multimodal Sensing and Artificial Intelligence: Technologies and Applications II, 117850I (20 June 2021); https://doi.org/10.1117/12.2592648

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available