立体RGB和基于较深的激光雷达网络3D对象检测

论文标题

立体RGB和基于较深的激光雷达网络3D对象检测

Stereo RGB and Deeper LIDAR Based Network for 3D Object Detection

论文作者

He, Qingdong, Wang, Zhengning, Zeng, Hao, Liu, Yijun, Liu, Shuaicheng, Zeng, Bing

论文摘要

3D对象检测已成为自主驾驶场景中的一项新任务。以前的工作过程使用基于投影或基于体素的模型的3D点云。但是，两种方法都包含一些缺点。基于体素的方法缺乏语义信息，而基于投影的方法将投影到不同的观点时会遭受许多空间信息丢失。在本文中，我们提出了立体声RGB和更深的LiDAR（SRDL）框架，该框架可以同时利用语义和空间信息，以便可以自然改进网络进行3D对象检测的性能。具体而言，该网络从立体声对生成候选框，并使用深层融合方案结合了不同区域的特征。与先前的作品相比，立体声策略提供了更多的预测信息。然后，将几个局部和全局特征提取器堆叠在分段模块中，以从点云中捕获更丰富的深层语义几何特征。将内部点与融合特征对齐后，提出的网络以更准确的方式完善了预测，并以一种新颖的紧凑方法对整个框进行编码。对挑战性KITTI检测基准的实验结果不错，证明了利用立体声图像和点云进行3D对象检测的有效性。

3D object detection has become an emerging task in autonomous driving scenarios. Previous works process 3D point clouds using either projection-based or voxel-based models. However, both approaches contain some drawbacks. The voxel-based methods lack semantic information, while the projection-based methods suffer from numerous spatial information loss when projected to different views. In this paper, we propose the Stereo RGB and Deeper LIDAR (SRDL) framework which can utilize semantic and spatial information simultaneously such that the performance of network for 3D object detection can be improved naturally. Specifically, the network generates candidate boxes from stereo pairs and combines different region-wise features using a deep fusion scheme. The stereo strategy offers more information for prediction compared with prior works. Then, several local and global feature extractors are stacked in the segmentation module to capture richer deep semantic geometric features from point clouds. After aligning the interior points with fused features, the proposed network refines the prediction in a more accurate manner and encodes the whole box in a novel compact method. The decent experimental results on the challenging KITTI detection benchmark demonstrate the effectiveness of utilizing both stereo images and point clouds for 3D object detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题