直接从频域直接识别更快，准确的压缩视频动作识别

论文标题

直接从频域直接识别更快，准确的压缩视频动作识别

Faster and Accurate Compressed Video Action Recognition Straight from the Frequency Domain

论文作者

Santos, Samuel Felipe dos, Almeida, Jurandy

论文摘要

由于其广泛的应用，例如监视，医疗，工业环境，智能家居等，人类的行动识别已成为计算机视觉研究中最活跃的研究领域之一。最近，深度学习已成功地用于学习强大而可解释的功能，以识别视频中的人类行为。大多数现有的深度学习方法都是设计用于将视频信息作为RGB图像序列处理的。因此，需要初步的解码过程，因为视频数据通常以压缩格式存储。但是，对于解码视频，需要高度的计算负载和内存使用量。为了克服这个问题，我们提出了一个深层神经网络，能够直接从压缩视频中学习。我们的方法在两个公共基准测试分别UCF-101和HMDB-51数据集上进行了评估，这些数据集证明了与最先进的方法相当的识别性能，并且在推理速度方面的运行速度高达2倍。

Human action recognition has become one of the most active field of research in computer vision due to its wide range of applications, like surveillance, medical, industrial environments, smart homes, among others. Recently, deep learning has been successfully used to learn powerful and interpretable features for recognizing human actions in videos. Most of the existing deep learning approaches have been designed for processing video information as RGB image sequences. For this reason, a preliminary decoding process is required, since video data are often stored in a compressed format. However, a high computational load and memory usage is demanded for decoding a video. To overcome this problem, we propose a deep neural network capable of learning straight from compressed video. Our approach was evaluated on two public benchmarks, the UCF-101 and HMDB-51 datasets, demonstrating comparable recognition performance to the state-of-the-art methods, with the advantage of running up to 2 times faster in terms of inference speed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题