论文标题
不仅仅是玩具:随机矩阵模型预测现实世界的神经表示如何概括
More Than a Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize
论文作者
论文摘要
关于为什么大规模机器学习模型的理论尽管被大量参数化,但需要哪些假设来捕获现实世界中的概括现象?一方面,我们发现,当应用于源自大规模神经网络(例如Resnet-50)和真实数据(例如CIFAR-100)的内核时,大多数理论分析即使在内核回归中也没有捕获这些定性现象。另一方面,我们发现经典的GCV估计量(Craven和Wahba,1978)即使在这种过度参数化的环境中也可以准确预测概括风险。为了加强这一经验发现,我们证明,每当局部随机矩阵定律成立时,GCV估计器都会趋于泛化风险。最后,我们应用这种随机矩阵理论镜头来解释为什么预处理的表示会更好地推广,以及哪些因素控制范围内核回归的缩放定律。我们的发现表明,随机矩阵理论,而不仅仅是一个玩具模型,可能是理解实践中神经表示的特性的核心。
Of theories for why large-scale machine learning models generalize despite being vastly overparameterized, which of their assumptions are needed to capture the qualitative phenomena of generalization in the real world? On one hand, we find that most theoretical analyses fall short of capturing these qualitative phenomena even for kernel regression, when applied to kernels derived from large-scale neural networks (e.g., ResNet-50) and real data (e.g., CIFAR-100). On the other hand, we find that the classical GCV estimator (Craven and Wahba, 1978) accurately predicts generalization risk even in such overparameterized settings. To bolster this empirical finding, we prove that the GCV estimator converges to the generalization risk whenever a local random matrix law holds. Finally, we apply this random matrix theory lens to explain why pretrained representations generalize better as well as what factors govern scaling laws for kernel regression. Our findings suggest that random matrix theory, rather than just being a toy model, may be central to understanding the properties of neural representations in practice.