论文标题
关于深度学习技术,以提高自动导航的单眼深度估计
On Deep Learning Techniques to Boost Monocular Depth Estimation for Autonomous Navigation
论文作者
论文摘要
推断图像的深度是计算机视觉领域内的一个基本反问题,因为深度信息是通过2D图像获得的,这可以从观察到的真实场景的无限可能性中产生。受益于卷积神经网络(CNN)探索结构特征和空间图像信息的进步,单图像深度估计(侧)通常在科学和技术创新的范围中突出显示,因为该概念提供了与其对环境条件的低实施成本和稳健性有关的优势。在自动驾驶汽车的背景下,最先进的CNN通过产生高质量的深度图来优化侧面任务,这在不同位置的自主导航过程中至关重要。但是,此类网络通常受到稀疏和嘈杂的深度数据的监督,从光检测和射程(LIDAR)激光扫描中,并以高计算成本进行,需要高性能的图形处理单元(GPU)。因此,我们提出了一种新的轻巧且快速监督的CNN体系结构,并结合新的功能提取模型,该模型是为现实世界自主导航而设计的。我们还引入了一个有效的表面正态模块,该模块具有简单的几何2.5D损失函数,以解决侧面问题。我们还通过合并多种深度学习技术来创新,例如使用致密化算法以及其他语义,表面正常和深度信息来训练我们的框架。这项工作中介绍的方法侧重于室内和室外环境中的机器人应用,其结果对竞争性和公开可用的NYU DEPTH V2和KITTI DEPTH数据集进行了评估。
Inferring the depth of images is a fundamental inverse problem within the field of Computer Vision since depth information is obtained through 2D images, which can be generated from infinite possibilities of observed real scenes. Benefiting from the progress of Convolutional Neural Networks (CNNs) to explore structural features and spatial image information, Single Image Depth Estimation (SIDE) is often highlighted in scopes of scientific and technological innovation, as this concept provides advantages related to its low implementation cost and robustness to environmental conditions. In the context of autonomous vehicles, state-of-the-art CNNs optimize the SIDE task by producing high-quality depth maps, which are essential during the autonomous navigation process in different locations. However, such networks are usually supervised by sparse and noisy depth data, from Light Detection and Ranging (LiDAR) laser scans, and are carried out at high computational cost, requiring high-performance Graphic Processing Units (GPUs). Therefore, we propose a new lightweight and fast supervised CNN architecture combined with novel feature extraction models which are designed for real-world autonomous navigation. We also introduce an efficient surface normals module, jointly with a simple geometric 2.5D loss function, to solve SIDE problems. We also innovate by incorporating multiple Deep Learning techniques, such as the use of densification algorithms and additional semantic, surface normals and depth information to train our framework. The method introduced in this work focuses on robotic applications in indoor and outdoor environments and its results are evaluated on the competitive and publicly available NYU Depth V2 and KITTI Depth datasets.