论文标题
MGTANET:使用长期短期运动引导的时间关注3D对象检测编码顺序的LIDAR点
MGTANet: Encoding Sequential LiDAR Points Using Long Short-Term Motion-Guided Temporal Attention for 3D Object Detection
论文作者
论文摘要
大多数扫描激光雷达传感器实时生成一系列点云。尽管传统的3D对象检测器使用在固定时间间隔内获得的一组无序的雷达点,但最近的研究表明,可以通过利用一系列大差点集中存在的时空环境来实现实质性的改善。在本文中,我们提出了一种新颖的3D对象检测体系结构,该架构可以编码通过多次连续扫描获得的LiDAR点云序列。点云序列的编码过程是在两个不同的时间尺度上执行的。我们首先设计了一个短期运动感知体素编码,该体素编码捕获了每个体素中对象运动驱动的点云的短期时间变化。我们还提出了长期运动引导的鸟类视图(BEV)的特征增强功能,以适应地对齐并聚集了通过使用特征图的顺序推断出的动态运动上下文,从而通过短期体素编码来获得的BEV特征图。对公共Nuscenes基准进行的实验表明,与基线方法相比,提出的3D对象检测器的性能可显着改善,并且为某些3D对象检测类别设定了最先进的性能。代码可从https://github.com/hyjhkoh/mgtanet.git获得。
Most scanning LiDAR sensors generate a sequence of point clouds in real-time. While conventional 3D object detectors use a set of unordered LiDAR points acquired over a fixed time interval, recent studies have revealed that substantial performance improvement can be achieved by exploiting the spatio-temporal context present in a sequence of LiDAR point sets. In this paper, we propose a novel 3D object detection architecture, which can encode LiDAR point cloud sequences acquired by multiple successive scans. The encoding process of the point cloud sequence is performed on two different time scales. We first design a short-term motion-aware voxel encoding that captures the short-term temporal changes of point clouds driven by the motion of objects in each voxel. We also propose long-term motion-guided bird's eye view (BEV) feature enhancement that adaptively aligns and aggregates the BEV feature maps obtained by the short-term voxel encoding by utilizing the dynamic motion context inferred from the sequence of the feature maps. The experiments conducted on the public nuScenes benchmark demonstrate that the proposed 3D object detector offers significant improvements in performance compared to the baseline methods and that it sets a state-of-the-art performance for certain 3D object detection categories. Code is available at https://github.com/HYjhkoh/MGTANet.git