视力变压器（VTS）如何转移到非天然图像域？一项涉及艺术分类的实证研究

论文标题

视力变压器（VTS）如何转移到非天然图像域？一项涉及艺术分类的实证研究

How Well Do Vision Transformers (VTs) Transfer To The Non-Natural Image Domain? An Empirical Study Involving Art Classification

论文作者

Tonkes, Vincent, Sabatelli, Matthia

论文摘要

当涉及涉及高维和空间组织的输入（例如图像）的问题时，视觉变压器（VTS）已成为卷积神经网络（CNN）的宝贵替代方法。但是，他们的传递学习（TL）属性尚未进行精心研究，尚不完全清楚这些神经体系结构是否可以跨不同领域和CNN转移。在本文中，我们研究了在流行的Imagenet数据集上预先训练的VT是否学习可转移到非天然图像域的表示形式。为此，我们考虑了三个经过深入研究的艺术分类问题，并将其用作研究四个流行VT的TL潜力的替代物。将它们的性能与几个TL实验中的四个常见CNN进行了广泛的比较。我们的结果表明，VTS具有强大的概括属性，并且这些网络比CNN更强大。

Vision Transformers (VTs) are becoming a valuable alternative to Convolutional Neural Networks (CNNs) when it comes to problems involving high-dimensional and spatially organized inputs such as images. However, their Transfer Learning (TL) properties are not yet well studied, and it is not fully known whether these neural architectures can transfer across different domains as well as CNNs. In this paper we study whether VTs that are pre-trained on the popular ImageNet dataset learn representations that are transferable to the non-natural image domain. To do so we consider three well-studied art classification problems and use them as a surrogate for studying the TL potential of four popular VTs. Their performance is extensively compared against that of four common CNNs across several TL experiments. Our results show that VTs exhibit strong generalization properties and that these networks are more powerful feature extractors than CNNs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题