深度神经网络的合奏，以供行动识别静止图像

论文标题

深度神经网络的合奏，以供行动识别静止图像

Ensembles of Deep Neural Networks for Action Recognition in Still Images

论文作者

Mohammadi, Sina, Majelan, Sina Ghofrani, Shokouhi, Shahriar B.

论文摘要

尽管最近在特征提取和分类领域进行了显着改进，但人类行动识别仍然具有挑战性，尤其是在图像中，其中与视频不同，没有动作。因此，建议在视频中识别人类行为的方法不能应用于静止图像。在静止图像中，动作识别的巨大挑战是缺乏足够大的数据集，这对于由于问题过高而导致的深卷卷神经网络（CNN）是有问题的。在本文中，通过利用预先训练的CNN，我们采用了转移学习技术来解决缺乏大量标记的动作识别数据集。此外，自CNN的最后一层具有特定于类的信息以来，我们将注意机制应用于CNN的输出特征图，以提取对人类行为分类的更具歧视性和强大功能。此外，我们在框架中使用了八种不同的预训练的CNN，并研究了他们在Stanford 40数据集上的性能。最后，我们建议使用集合学习技术通过组合多个模型的预测来增强动作分类的总体准确性。我们方法的最佳设置能够在Stanford 40数据集上实现93.17 $ \％$的精度。

Despite the fact that notable improvements have been made recently in the field of feature extraction and classification, human action recognition is still challenging, especially in images, in which, unlike videos, there is no motion. Thus, the methods proposed for recognizing human actions in videos cannot be applied to still images. A big challenge in action recognition in still images is the lack of large enough datasets, which is problematic for training deep Convolutional Neural Networks (CNNs) due to the overfitting issue. In this paper, by taking advantage of pre-trained CNNs, we employ the transfer learning technique to tackle the lack of massive labeled action recognition datasets. Furthermore, since the last layer of the CNN has class-specific information, we apply an attention mechanism on the output feature maps of the CNN to extract more discriminative and powerful features for classification of human actions. Moreover, we use eight different pre-trained CNNs in our framework and investigate their performance on Stanford 40 dataset. Finally, we propose using the Ensemble Learning technique to enhance the overall accuracy of action classification by combining the predictions of multiple models. The best setting of our method is able to achieve 93.17$\%$ accuracy on the Stanford 40 dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题