论文标题
视觉惯性大满贯的深度估计
Deep Depth Estimation from Visual-Inertial SLAM
论文作者
论文摘要
本文解决了学习从稀疏深度和室内场景图像中完成场景深度的问题。具体而言,我们研究了从视觉惯性同时定位和映射(VI-SLAM)系统计算出稀疏深度的情况。与主动深度传感器(例如Lidar或Kinect)相比,所得点云的密度低,是嘈杂的,并且具有不均匀的空间分布。由于VI-SLAM仅在纹理区域上产生点云,因此我们通过利用其平面结构及其表面正常状态来弥补低文字表面的缺失深度,这是重要的中间表示。但是,与受过训练的图像相比,预训练的表面正常网络会遭受大量性能降解。为了解决这一限制,我们使用从VI-SLAM的可用重力估计值将输入图像转换为训练数据集中盛行的方向。这会导致表面正常估计值的显着性能增长,从而导致密集的深度估计值。最后,我们表明,我们的方法在训练(扫描和NYUV2)和测试(使用Azure Kinect)数据集方面的其他最先进方法都优于其他最先进的方法。
This paper addresses the problem of learning to complete a scene's depth from sparse depth points and images of indoor scenes. Specifically, we study the case in which the sparse depth is computed from a visual-inertial simultaneous localization and mapping (VI-SLAM) system. The resulting point cloud has low density, it is noisy, and has non-uniform spatial distribution, as compared to the input from active depth sensors, e.g., LiDAR or Kinect. Since the VI-SLAM produces point clouds only over textured areas, we compensate for the missing depth of the low-texture surfaces by leveraging their planar structures and their surface normals which is an important intermediate representation. The pre-trained surface normal network, however, suffers from large performance degradation when there is a significant difference in the viewing direction (especially the roll angle) of the test image as compared to the trained ones. To address this limitation, we use the available gravity estimate from the VI-SLAM to warp the input image to the orientation prevailing in the training dataset. This results in a significant performance gain for the surface normal estimate, and thus the dense depth estimates. Finally, we show that our method outperforms other state-of-the-art approaches both on training (ScanNet and NYUv2) and testing (collected with Azure Kinect) datasets.