使用深度度量学习弱监督的时间动作本地化

论文标题

使用深度度量学习弱监督的时间动作本地化

Weakly Supervised Temporal Action Localization Using Deep Metric Learning

论文作者

Islam, Ashraful, Radke, Richard J.

论文摘要

时间动作本地化是迈向视频理解的重要一步。当前的大多数动作定位方法都取决于没有完整的动作实例时间注释的未修剪视频。但是，注释视频的动作标签和时间边界是昂贵且耗时的。为此，我们提出了一种弱监督的时间行动定位方法，该方法仅需要视频级别的行动实例作为培训期间的监督。我们提出了一个分类模块，以为视频中的每个段生成动作标签，以及一个深度度量学习模块，以了解不同的动作实例之间的相似性。我们共同优化了使用标准反向传播算法的平衡二进制跨环损失和度量损失。广泛的实验证明了这两个组成部分在时间定位中的有效性。我们在两个具有挑战性的未修剪视频数据集上评估了算法：Thumos14和ActivityNet1.2。我们的方法将Thumos14的当前最新结果提高到IOU阈值0.5的6.5％地图，并在ActivityNet1.2方面实现了竞争性能。

Temporal action localization is an important step towards video understanding. Most current action localization methods depend on untrimmed videos with full temporal annotations of action instances. However, it is expensive and time-consuming to annotate both action labels and temporal boundaries of videos. To this end, we propose a weakly supervised temporal action localization method that only requires video-level action instances as supervision during training. We propose a classification module to generate action labels for each segment in the video, and a deep metric learning module to learn the similarity between different action instances. We jointly optimize a balanced binary cross-entropy loss and a metric loss using a standard backpropagation algorithm. Extensive experiments demonstrate the effectiveness of both of these components in temporal localization. We evaluate our algorithm on two challenging untrimmed video datasets: THUMOS14 and ActivityNet1.2. Our approach improves the current state-of-the-art result for THUMOS14 by 6.5% mAP at IoU threshold 0.5, and achieves competitive performance for ActivityNet1.2.

下载PDF全文

下载文献需遵守相关版权规定

论文标题