imultaneous 3D scene reconstruction and semantic segmentation are required in many applications such as autonomous driving, robotics, and optical metrology. Classic 3D reconstruction methods usually perform such operations twofold. Firstly, a 3D scanner or laser scanner acquires a point cloud. Secondly, semantic segmentation of the point cloud is performed. Recently a new kind of 3D model representation was proposed that utilizes the trapezium-shaped voxels that are aligned with the camera’s frustum and pixels [1]. Frustum voxel models proved to be effective for monocular 3D scene reconstruction and segmentation from monocular images [2]. Still, many existing 3D scanning systems readily provide stereo cameras. The performance of frustum voxel model-based methods for stereo input remains an open question. This paper is focused on the evaluation of the 3D reconstruction quality of a volumetric neural network with a monocular and stereo input. We leverage an SSZ [2] volumetric neural network as a starting point for our research. We develop its modified version that we term Stereo-SSZ that receives a stereo pair as an input. We compare the performance of the original SSZ model and our Stereo-SSZ model on different real and synthetic 3D shape datasets. Specifically, we generate a stereo version of the SemanticVoxels [2] dataset and capture stereo pairs of multiple real objects using a structured light scanner. The results of our experiments are encouraging and demonstrate that the model with a stereo input outperforms the original monocular SSZ network. Specifically, the frustum voxel models generated by our Stereo-SSZ model have lower surface distance errors and demonstrate fine details in the reconstructed 3D models.
|