论文标题

高维两层神经网络中随机梯度下降的相图

Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

论文作者

Veiga, Rodrigo, Stephan, Ludovic, Loureiro, Bruno, Krzakala, Florent, Zdeborová, Lenka

论文摘要

尽管非凸优化领域,但过度参数化的浅网络能够在梯度下降下实现全球收敛。对于狭窄的网络而言,图片可能会大不相同,狭窄的网络往往会陷入不良的本地最小值中。在这里,我们研究了高维环境中这两个制度之间的交叉,特别是研究了所谓的平均场/流体动力学状态与Saad&Solla的开创方法之间的联系。为了关注高斯数据的情况,我们研究了学习率,时间尺度和随机梯度下降(SGD)高维动力学中隐藏单元数量之间的相互作用。我们的工作基于对统计物理学的高维度的SGD的确定性描述,我们扩展了该SGD,并为其提供严格的合并率。

Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connection between the so-called mean-field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源