论文标题
RCL:时间动作检测的循环连续定位
RCL: Recurrent Continuous Localization for Temporal Action Detection
论文作者
论文摘要
时间表示是现代动作检测技术的基石。最先进的方法主要依赖于密集的锚定方案,在该方案中,锚固在时间域中以离散的网格均匀地在时间域上进行采样,然后回归准确的边界。在本文中,我们重新审视了这个基础阶段,并引入了经常性连续定位(RCL),该定位学会了完全连续的锚定表示。具体而言,所提出的表示形式建立在具有视频嵌入和时间坐标条件的显式模型上,该模型确保了检测任意长度的段的能力。为了优化连续的表示,我们制定了有效的规模不变采样策略,并在随后的迭代中经常完善预测。我们的连续锚定方案是完全可区分的,可以将无缝集成到现有检测器中,例如BMN和G-TAD。在两个基准上进行的广泛实验表明,我们的连续表示稳步超过了其他离散的同行。结果,RCL在Thumos14上达到52.92%[email protected],在ActivTiynet v1.3上获得了37.65%的地图,表现优于所有现有的单模检测器。
Temporal representation is the cornerstone of modern action detection techniques. State-of-the-art methods mostly rely on a dense anchoring scheme, where anchors are sampled uniformly over the temporal domain with a discretized grid, and then regress the accurate boundaries. In this paper, we revisit this foundational stage and introduce Recurrent Continuous Localization (RCL), which learns a fully continuous anchoring representation. Specifically, the proposed representation builds upon an explicit model conditioned with video embeddings and temporal coordinates, which ensure the capability of detecting segments with arbitrary length. To optimize the continuous representation, we develop an effective scale-invariant sampling strategy and recurrently refine the prediction in subsequent iterations. Our continuous anchoring scheme is fully differentiable, allowing to be seamlessly integrated into existing detectors, e.g., BMN and G-TAD. Extensive experiments on two benchmarks demonstrate that our continuous representation steadily surpasses other discretized counterparts by ~2% mAP. As a result, RCL achieves 52.92% [email protected] on THUMOS14 and 37.65% mAP on ActivtiyNet v1.3, outperforming all existing single-model detectors.