论文标题
使用图形神经网络具有跨边模式的3D多对象跟踪
3D Multi-Object Tracking Using Graph Neural Networks with Cross-Edge Modality Attention
论文作者
论文摘要
在线3D多对象跟踪(MOT)近年来见证了重大的研究兴趣,这在很大程度上是由自主系统社区的需求驱动的。但是,3D离线MOT的探索相对较少。大规模标记3D轨迹场景数据,而不依赖高成本的人类专家仍然是一个开放的研究问题。在这项工作中,我们提出了batch3dmot,该batch3dmot遵循按检测范式跟踪,并按照指示,无界和类别 - 偶有跟踪图代表现实世界的场景,这些场景使用诸如相机,激光雷达和雷达等各种模态归因。我们提出了一个多模式图神经网络,该网络使用了缓解模态间歇性的跨边缘注意机制,该机制转化为图域中的稀疏性。此外,我们将注意力加权的卷积在框架K-NN社区上作为合适的手段,以允许在断开的图形组件之间进行信息交换。我们使用各种传感器方式和模型配置在具有挑战性的Nuscenes和Kitti数据集上评估我们的方法。广泛的实验表明,我们提出的方法在Nuscenes的AMOTA分数中总体提高了3.3%,从而为3D跟踪的新最新进行了,并进一步增强了假阳性过滤。
Online 3D multi-object tracking (MOT) has witnessed significant research interest in recent years, largely driven by demand from the autonomous systems community. However, 3D offline MOT is relatively less explored. Labeling 3D trajectory scene data at a large scale while not relying on high-cost human experts is still an open research question. In this work, we propose Batch3DMOT which follows the tracking-by-detection paradigm and represents real-world scenes as directed, acyclic, and category-disjoint tracking graphs that are attributed using various modalities such as camera, LiDAR, and radar. We present a multi-modal graph neural network that uses a cross-edge attention mechanism mitigating modality intermittence, which translates into sparsity in the graph domain. Additionally, we present attention-weighted convolutions over frame-wise k-NN neighborhoods as suitable means to allow information exchange across disconnected graph components. We evaluate our approach using various sensor modalities and model configurations on the challenging nuScenes and KITTI datasets. Extensive experiments demonstrate that our proposed approach yields an overall improvement of 3.3% in the AMOTA score on nuScenes thereby setting the new state-of-the-art for 3D tracking and further enhancing false positive filtering.