ACTAR：由演员驱动的姿势嵌入视频动作识别

论文标题

ACTAR：由演员驱动的姿势嵌入视频动作识别

ActAR: Actor-Driven Pose Embeddings for Video Action Recognition

论文作者

Lamghari, Soufiane, Bilodeau, Guillaume-Alexandre, Saunier, Nicolas

论文摘要

视频中的人类行动识别（HAR）是视频理解的核心任务之一。基于视频序列，目标是识别人类执行的动作。尽管HAR在可见的范围中受到了很多关注，但红外视频中的动作识别很少。由于序列中存在冗余且难以区分的纹理特征，因此对红外域中人类行为的准确认识是一项高度挑战的任务。此外，在某些情况下，挑战是由于存在多个活跃的人所引起的无关信息而产生的，而没有促成真正的感兴趣行动。因此，大多数现有方法都考虑了一个不考虑这些挑战的标准范式，这在某些部分是由于在某些情况下对识别任务的含糊定义。在本文中，我们提出了一种新方法，该方法同时学会了在红外频谱中有效地识别人类行为，同时自动识别执行动作的键性操作器而无需使用任何先验知识或明确的注释。我们的方法由三个阶段组成。在第一阶段，进行基于光流的钥匙传动识别。然后，对于每个键演员，我们估计将指导框架选择过程的钥匙置。为了增强动作质量表示，进行了标尺不变的编码过程以及嵌入式姿势过滤。基础数据集的实验结果表明，我们提出的模型实现了有希望的识别性能，并学习了有用的行动表示。

Human action recognition (HAR) in videos is one of the core tasks of video understanding. Based on video sequences, the goal is to recognize actions performed by humans. While HAR has received much attention in the visible spectrum, action recognition in infrared videos is little studied. Accurate recognition of human actions in the infrared domain is a highly challenging task because of the redundant and indistinguishable texture features present in the sequence. Furthermore, in some cases, challenges arise from the irrelevant information induced by the presence of multiple active persons not contributing to the actual action of interest. Therefore, most existing methods consider a standard paradigm that does not take into account these challenges, which is in some part due to the ambiguous definition of the recognition task in some cases. In this paper, we propose a new method that simultaneously learns to recognize efficiently human actions in the infrared spectrum, while automatically identifying the key-actors performing the action without using any prior knowledge or explicit annotations. Our method is composed of three stages. In the first stage, optical flow-based key-actor identification is performed. Then for each key-actor, we estimate key-poses that will guide the frame selection process. A scale-invariant encoding process along with embedded pose filtering are performed in order to enhance the quality of action representations. Experimental results on InfAR dataset show that our proposed model achieves promising recognition performance and learns useful action representations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题