SERE：探索自我监督变压器的功能自我关联

论文标题

SERE：探索自我监督变压器的功能自我关联

SERE: Exploring Feature Self-relation for Self-supervised Transformer

论文作者

Li, Zhong-Yu, Gao, Shanghua, Cheng, Ming-Ming

论文摘要

卷积网络（CNN）的自我划分的学习表征已得到验证，以对视觉任务有效。作为CNN的替代方案，视觉变压器（VIT）具有强大的代表能力，具有空间自我注意力和渠道级别的馈电网络。最近的作品表明，自我监督的学习有助于释放VIT的巨大潜力。尽管如此，大多数作品都遵循为CNN设计的自制策略，例如样本的实例级别歧视，但它们忽略了VIT的特性。我们观察到在空间和信道维度上的关系建模将VIT与其他网络区分开。为了强制执行此属性，我们探索了用于培训自我监督的VIT的功能自相关（SERE）。具体而言，我们利用特征自我关系（即空间/频道自我关系）来进行自我监督的学习，而不是仅仅从多个视图中进行自我监督的学习，而是利用特征自我关系。基于自相关的学习进一步增强了VIT的关系建模能力，从而产生了更强的表示，从而稳定地改善了多个下游任务的性能。我们的源代码可公开可用：https：//github.com/mcg-nku/sere。

Learning representations with self-supervision for convolutional networks (CNN) has been validated to be effective for vision tasks. As an alternative to CNN, vision transformers (ViT) have strong representation ability with spatial self-attention and channel-level feedforward networks. Recent works reveal that self-supervised learning helps unleash the great potential of ViT. Still, most works follow self-supervised strategies designed for CNN, e.g., instance-level discrimination of samples, but they ignore the properties of ViT. We observe that relational modeling on spatial and channel dimensions distinguishes ViT from other networks. To enforce this property, we explore the feature SElf-RElation (SERE) for training self-supervised ViT. Specifically, instead of conducting self-supervised learning solely on feature embeddings from multiple views, we utilize the feature self-relations, i.e., spatial/channel self-relations, for self-supervised learning. Self-relation based learning further enhances the relation modeling ability of ViT, resulting in stronger representations that stably improve performance on multiple downstream tasks. Our source code is publicly available at: https://github.com/MCG-NKU/SERE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题