梯度饥饿：神经网络中的学习倾向

论文标题

梯度饥饿：神经网络中的学习倾向

Gradient Starvation: A Learning Proclivity in Neural Networks

论文作者

Pezeshki, Mohammad, Kaba, Sékou-Oumar, Bengio, Yoshua, Courville, Aaron, Precup, Doina, Lajoie, Guillaume

论文摘要

我们确定并形式化了基本梯度下降现象，从而导致过度参数化的神经网络的学习倾向。尽管存在未发现的其他预测功能，但仅通过捕获与任务相关的一部分特征来最大程度地捕获跨透明拷贝损失的阶段饥饿。这项工作为神经网络中这种特征失衡的出现提供了理论上的解释。使用动力学系统理论中的工具，我们确定了导致这种失衡的梯度下降过程中学习动力学的简单属性，并证明鉴于训练数据中的某些统计结构，可以预期这种情况。基于我们提出的形式主义，我们为一种旨在解耦的新型正则化方法的保证，在受梯度饥饿阻碍的情况下提高了精度和鲁棒性。我们通过简单且现实世界中的分布（OOD）概括实验来说明我们的发现。

We identify and formalize a fundamental gradient descent phenomenon resulting in a learning proclivity in over-parameterized neural networks. Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task, despite the presence of other predictive features that fail to be discovered. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks. Using tools from Dynamical Systems theory, we identify simple properties of learning dynamics during gradient descent that lead to this imbalance, and prove that such a situation can be expected given certain statistical structure in training data. Based on our proposed formalism, we develop guarantees for a novel regularization method aimed at decoupling feature learning dynamics, improving accuracy and robustness in cases hindered by gradient starvation. We illustrate our findings with simple and real-world out-of-distribution (OOD) generalization experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题