视觉大满贯的体素图

论文标题

视觉大满贯的体素图

Voxel Map for Visual SLAM

论文作者

Muglikar, Manasi, Zhang, Zichao, Scaramuzza, Davide

论文摘要

在现代视觉大满贯系统中，这是一种标准做法，可以从重叠的密钥帧中检索潜在的候选地图点，以进一步匹配或直接跟踪。在这项工作中，我们认为，由于几个固有的局限性，例如弱的几何推理和较差的可扩展性，关键帧并不是该任务的最佳选择。我们提出了一个体素图表示，以有效地检索视觉大满贯的地图点。特别是，我们在常规体素网格中组织地图点。通过以射线播放方式对相机frustum进行采样，可以查询相机姿势的可见点，可以使用高效的体素散列方法在恒定时间内完成。与钥匙帧相比，使用我们的方法检索点可以保证在摄像头视野中落入，并且可以将遮挡点识别并删除至一定的扩展。此方法还自然地扩展到大型场景和复杂的多摄像头配置。实验结果表明，我们的体素图表示与具有5个关键帧的密钥帧地图一样有效，并且在EUROC数据集上提供了更高的本地化精度（RMSE的平均46％提高）。所提出的素映射表示是视觉猛击中基本功能的一般方法，并且广泛适用。

In modern visual SLAM systems, it is a standard practice to retrieve potential candidate map points from overlapping keyframes for further feature matching or direct tracking. In this work, we argue that keyframes are not the optimal choice for this task, due to several inherent limitations, such as weak geometric reasoning and poor scalability. We propose a voxel-map representation to efficiently retrieve map points for visual SLAM. In particular, we organize the map points in a regular voxel grid. Visible points from a camera pose are queried by sampling the camera frustum in a raycasting manner, which can be done in constant time using an efficient voxel hashing method. Compared with keyframes, the retrieved points using our method are geometrically guaranteed to fall in the camera field-of-view, and occluded points can be identified and removed to a certain extend. This method also naturally scales up to large scenes and complicated multicamera configurations. Experimental results show that our voxel map representation is as efficient as a keyframe map with 5 keyframes and provides significantly higher localization accuracy (average 46% improvement in RMSE) on the EuRoC dataset. The proposed voxel-map representation is a general approach to a fundamental functionality in visual SLAM and widely applicable.

下载PDF全文

下载文献需遵守相关版权规定

论文标题