Longshortnet：探索流媒体感知中的时间和语义特征融合

论文标题

Longshortnet：探索流媒体感知中的时间和语义特征融合

LongShortNet: Exploring Temporal and Semantic Features Fusion in Streaming Perception

论文作者

Li, Chenyang, Cheng, Zhi-Qi, He, Jun-Yan, Li, Pengyu, Luo, Bin, Chen, Hanyuan, Geng, Yifeng, Lan, Jin-Peng, Xie, Xuansong

论文摘要

流媒体感知是自动驾驶中的关键任务，需要平衡自动驾驶系统的延迟和准确性。但是，当前流媒体感知的方法仅依靠当前和相邻的两个帧来学习运动模式。这限制了他们建模复杂场景的能力，通常导致检测结果不佳。为了解决这一限制，我们提出了Longshortnet，这是一个新型的双路径网络，可捕获长期的时间运动，并将其与短期空间语义集成以进行实时感知。 Longshortnet是值得注意的，因为它是将长期时间建模扩展到流媒体感知的第一项工作，从而实现了时空特征融合。我们在具有挑战性的Argoverse-HD数据集上评估了Longshortnet，并证明它的表现优于现有的最新方法，几乎没有额外的计算成本。

Streaming perception is a critical task in autonomous driving that requires balancing the latency and accuracy of the autopilot system. However, current methods for streaming perception are limited as they only rely on the current and adjacent two frames to learn movement patterns. This restricts their ability to model complex scenes, often resulting in poor detection results. To address this limitation, we propose LongShortNet, a novel dual-path network that captures long-term temporal motion and integrates it with short-term spatial semantics for real-time perception. LongShortNet is notable as it is the first work to extend long-term temporal modeling to streaming perception, enabling spatiotemporal feature fusion. We evaluate LongShortNet on the challenging Argoverse-HD dataset and demonstrate that it outperforms existing state-of-the-art methods with almost no additional computational cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题