PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Image-based scene 3D reconstruction is one of the key tasks for many machine vision applications such as scene understanding, object pose estimation, autonomous navigation. A set of reliable and accurate methods for multi-view scene 3D reconstruction has been developed last decades. But a significant drawback of such 3D reconstruction technique is the need for acquiring a large number of images in the processed sequence to obtain an acceptable 3D scene representation. Recently modern convolutional neural network (CNN) models achieve the best quality for object recognition, image segmentation, image translation and some other challenging computer vision problems. The paper proposes a convolutional neural network architecture and a technique for training data preparation which provide a prediction of voxel model of a 3D scene with several objects. For CNN training and evaluation a special dataset was collected and annotated. It contains image sequences of several scenes and corresponding depth images and 3D models of these scenes. The image sequence serves as the primary data used for further scene 3D reconstruction by SfM technique. Structure from Motion processing results in surface 3D models of all objects in the scene and camera positions and orientation for every image in a sequence. Then surface 3D model is transformed into voxel 3D model and segmented into separate objects. Conditional generative adversarial network architecture was developed for 3D reconstruction by single image. Its generative part translates an input color image into an output voxel model. The discriminative part distinguishes the correct output (close to real voxel model) from false output (wrong output voxel model). Both parts are trained simultaneously on the prepared dataset. Evaluation on the testing part of the prepared dataset has demonstrated the ability of prediction 3D models of previously unobserved complex scenes containing several objects. The proposed neural network architecture provides high generalization ability and improved resolution of predicted voxel 3D models.
Vladimir Knyaz
"Machine learning for scene 3D reconstruction using a single image", Proc. SPIE 11353, Optics, Photonics and Digital Technologies for Imaging Applications VI, 1135321 (6 April 2020); https://doi.org/10.1117/12.2556122
ACCESS THE FULL ARTICLE
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
The alert did not successfully save. Please try again later.
Vladimir Knyaz, "Machine learning for scene 3D reconstruction using a single image," Proc. SPIE 11353, Optics, Photonics and Digital Technologies for Imaging Applications VI, 1135321 (6 April 2020); https://doi.org/10.1117/12.2556122