通过激活的固有维度将正则化和概括相关

论文标题

通过激活的固有维度将正则化和概括相关

Relating Regularization and Generalization through the Intrinsic Dimension of Activations

论文作者

Brown, Bradley C. A., Juravsky, Jordan, Caterini, Anthony L., Loaiza-Ganem, Gabriel

论文摘要

考虑到一对具有相似训练集的模型，自然可以假设拥有更简单的内部表示的模型将表现出更好的概括。在这项工作中，我们通过分析模型激活的固有维度（ID）为这种直觉提供了经验证据，这可以被视为模型表示数据表示的变化因素的最小数量。首先，我们表明，通用正则化技术均匀地降低了图像分类模型的验证集激活的最后层ID（LLID），并显示这如何强烈影响概括性能。我们还研究了过度正则化如何降低模型从较早层中从数据中提取特征的能力，即使LLID继续降低并且训练精度仍然几乎完美，也会对验证精度产生负面影响。最后，我们研究了展示Grokking模型的培训过程中的LLID。我们观察到，在训练精度后，当模型``grok''且验证精度突然从随机变为完美时饱和，llid的突然突然下降，从而提供了对突然概括的动力学的更深入的洞察力。

Given a pair of models with similar training set performance, it is natural to assume that the model that possesses simpler internal representations would exhibit better generalization. In this work, we provide empirical evidence for this intuition through an analysis of the intrinsic dimension (ID) of model activations, which can be thought of as the minimal number of factors of variation in the model's representation of the data. First, we show that common regularization techniques uniformly decrease the last-layer ID (LLID) of validation set activations for image classification models and show how this strongly affects generalization performance. We also investigate how excessive regularization decreases a model's ability to extract features from data in earlier layers, leading to a negative effect on validation accuracy even while LLID continues to decrease and training accuracy remains near-perfect. Finally, we examine the LLID over the course of training of models that exhibit grokking. We observe that well after training accuracy saturates, when models ``grok'' and validation accuracy suddenly improves from random to perfect, there is a co-occurent sudden drop in LLID, thus providing more insight into the dynamics of sudden generalization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题