论文标题

视觉变压器具有牢固面部表达识别的细心池

Vision Transformer with Attentive Pooling for Robust Facial Expression Recognition

论文作者

Xue, Fanglei, Wang, Qiangchang, Tan, Zichang, Ma, Zhongsong, Guo, Guodong

论文摘要

野外面部表情识别(FER)是一项极具挑战性的任务。最近,已经探索了一些视力变压器(VIT)的FER,但是与卷积神经网络(CNN)相比,大多数人的性能较低。这主要是因为由于缺乏电感偏见以及易于专注于遮挡和嘈杂的区域,新提出的模块很难从头开始融合。转移是一种基于代表性变压器的FER方法,可以通过多分支的注意力下降来减轻这种方法,但会带来过多的计算。相反,我们向直接汇总噪声特征提出了两个细心的合并(AP)模块。 AP模块包括细心的补丁池(APP)和细心的令牌池(ATP)。他们的目的是指导模型强调最歧视的功能,同时减少较小相关功能的影响。提出的应用程序用于选择有关CNN功能的最有用的补丁,而ATP丢弃了VIT中不重要的令牌。该应用程序和ATP易于实现并且没有可学习的参数,因此仅通过追求最歧视的功能来促进性能,同时降低了计算成本。定性的结果证明了我们专注的池塘的动机和有效性。此外,六个野外数据集的定量结果优于其他最先进的方法。

Facial Expression Recognition (FER) in the wild is an extremely challenging task. Recently, some Vision Transformers (ViT) have been explored for FER, but most of them perform inferiorly compared to Convolutional Neural Networks (CNN). This is mainly because the new proposed modules are difficult to converge well from scratch due to lacking inductive bias and easy to focus on the occlusion and noisy areas. TransFER, a representative transformer-based method for FER, alleviates this with multi-branch attention dropping but brings excessive computations. On the contrary, we present two attentive pooling (AP) modules to pool noisy features directly. The AP modules include Attentive Patch Pooling (APP) and Attentive Token Pooling (ATP). They aim to guide the model to emphasize the most discriminative features while reducing the impacts of less relevant features. The proposed APP is employed to select the most informative patches on CNN features, and ATP discards unimportant tokens in ViT. Being simple to implement and without learnable parameters, the APP and ATP intuitively reduce the computational cost while boosting the performance by ONLY pursuing the most discriminative features. Qualitative results demonstrate the motivations and effectiveness of our attentive poolings. Besides, quantitative results on six in-the-wild datasets outperform other state-of-the-art methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源