对自我监督对比学习有几何理解

论文标题

对自我监督对比学习有几何理解

Toward a Geometrical Understanding of Self-supervised Contrastive Learning

论文作者

Cosentino, Romain, Sengupta, Anirvan, Avestimehr, Salman, Soltanolkotabi, Mahdi, Ortega, Antonio, Willke, Ted, Tepper, Mariano

论文摘要

自我监督学习（SSL）目前是创建在没有人类注释的情况下可转移学习的数据表示的主要技术之一。尽管他们成功了，但这些表示形式的基本几何形状仍然难以捉摸，这使人们对更健壮，可信赖和可解释的模型的追求混淆了。特别是，主流SSL技术依赖于具有两个级联神经网络的特定深度神经网络体系结构：编码器和投影仪。当用于转移学习时，投影仪会被丢弃，因为经验结果表明，其表示形式比编码器的表现更差。在本文中，我们研究了这种奇怪的现象，并分析了数据增强策略的强度如何影响数据嵌入。我们发现编码器，投影仪和数据增强强度之间的非平地关系：随着越来越大的增强策略，投影仪而不是编码器更强烈地驱动着成为增强的不变性。通过学习将其投影到低维空间中，它可以消除有关数据的关键信息，这是对编码器表示中数据歧管切线平面的嘈杂估计。通过理论和经验结果，通过几何观点来证实该分析。

Self-supervised learning (SSL) is currently one of the premier techniques to create data representations that are actionable for transfer learning in the absence of human annotations. Despite their success, the underlying geometry of these representations remains elusive, which obfuscates the quest for more robust, trustworthy, and interpretable models. In particular, mainstream SSL techniques rely on a specific deep neural network architecture with two cascaded neural networks: the encoder and the projector. When used for transfer learning, the projector is discarded since empirical results show that its representation generalizes more poorly than the encoder's. In this paper, we investigate this curious phenomenon and analyze how the strength of the data augmentation policies affects the data embedding. We discover a non-trivial relation between the encoder, the projector, and the data augmentation strength: with increasingly larger augmentation policies, the projector, rather than the encoder, is more strongly driven to become invariant to the augmentations. It does so by eliminating crucial information about the data by learning to project it into a low-dimensional space, a noisy estimate of the data manifold tangent plane in the encoder representation. This analysis is substantiated through a geometrical perspective with theoretical and empirical results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题