论文标题
通过进行性互补学习来弱监督的时间行动本地化
Weakly-Supervised Temporal Action Localization by Progressive Complementary Learning
论文作者
论文摘要
弱监督的时间动作本地化(WSTAL)旨在将仅具有视频级别类别标签的长期未修剪视频进行本地化和分类。由于缺乏指示动作边界的摘要级监督,因此以前的方法通常为未标记的摘要分配伪标签。但是,由于不同类别的某些动作实例在视觉上相似,因此确切标记(通常)为片段的一个操作类别标记(通常)的伪标签会损害本地化性能,这是不平凡的。为了解决这个问题,我们从类别排除的角度提出了一种新颖的方法,名为“进步互补学习”(PROCL),该方法逐渐增强了摘要级别的监督。我们的方法的灵感来自视频级标签精确表明所有摘要肯定不属于的类别,这是由以前的作品所忽略的。因此,我们首先通过补充学习损失来排除这些肯定不存在的类别。然后,我们介绍了背景感知的伪互补标签,以便排除更多模棱两可的片段的类别。此外,对于剩余的模棱两可的片段,我们试图通过将前景动作与背景区分开来减少歧义。广泛的实验结果表明,我们的方法在两个流行的基准测试中实现了新的最新性能,即Thumos14和ActivityNet1.3。
Weakly Supervised Temporal Action Localization (WSTAL) aims to localize and classify action instances in long untrimmed videos with only video-level category labels. Due to the lack of snippet-level supervision for indicating action boundaries, previous methods typically assign pseudo labels for unlabeled snippets. However, since some action instances of different categories are visually similar, it is non-trivial to exactly label the (usually) one action category for a snippet, and incorrect pseudo labels would impair the localization performance. To address this problem, we propose a novel method from a category exclusion perspective, named Progressive Complementary Learning (ProCL), which gradually enhances the snippet-level supervision. Our method is inspired by the fact that video-level labels precisely indicate the categories that all snippets surely do not belong to, which is ignored by previous works. Accordingly, we first exclude these surely non-existent categories by a complementary learning loss. And then, we introduce the background-aware pseudo complementary labeling in order to exclude more categories for snippets of less ambiguity. Furthermore, for the remaining ambiguous snippets, we attempt to reduce the ambiguity by distinguishing foreground actions from the background. Extensive experimental results show that our method achieves new state-of-the-art performance on two popular benchmarks, namely THUMOS14 and ActivityNet1.3.