论文标题
通过结构提示增强视觉位置识别
Augmenting Visual Place Recognition with Structural Cues
论文作者
论文摘要
在本文中,我们建议通过结构提示增强基于图像的位置识别。具体而言,这些结构提示是使用从结构中获得的,因此不需要其他传感器才能进行位置识别。这是通过增强通常用于基于图像的位置识别的2D卷积神经网络(CNN)的3D CNN来实现的,该识别是输入源自从结构 - 移动点云的结构的体素网格。我们评估了融合2D和3D功能的不同方法,并获得最佳性能,并通过全球平均池和简单的串联进行最佳性能。在牛津机器人数据集上,与仅从一种输入方式(包括基于图像的描述符)中提取的描述符相比,所得的描述符表现出卓越的识别性能。特别是在低描述符维度上,我们的表现要优于最先进的描述符90%。
In this paper, we propose to augment image-based place recognition with structural cues. Specifically, these structural cues are obtained using structure-from-motion, such that no additional sensors are needed for place recognition. This is achieved by augmenting the 2D convolutional neural network (CNN) typically used for image-based place recognition with a 3D CNN that takes as input a voxel grid derived from the structure-from-motion point cloud. We evaluate different methods for fusing the 2D and 3D features and obtain best performance with global average pooling and simple concatenation. On the Oxford RobotCar dataset, the resulting descriptor exhibits superior recognition performance compared to descriptors extracted from only one of the input modalities, including state-of-the-art image-based descriptors. Especially at low descriptor dimensionalities, we outperform state-of-the-art descriptors by up to 90%.