论文标题

深度学习的粗略经验自然梯度方法

Sketchy Empirical Natural Gradient Methods for Deep Learning

论文作者

Yang, Minghan, Xu, Dong, Wen, Zaiwen, Chen, Mengyun, Xu, Pengxiang

论文摘要

在本文中,我们开发了一种有效的粗略经验自然梯度方法(SENG),用于大规模深度学习问题。经验的Fisher信息矩阵通常很低,因为在每次迭代中的少量数据上,采样仅是实用的。尽管相应的自然梯度方向位于一个小的子空间中,但由于高维度,计算成本和内存需求仍然无法处理。我们为不同的神经网络结构设计随机技术来解决这些挑战。对于具有合理尺寸的层,可以在正规化的最小二乘子问题上执行素描。否则,由于梯度是两个矩阵之间产品的矢量化,因此我们在这些矩阵的低级别近似值上应用草图来计算最昂贵的零件。 SENG的分布式版本也针对非常大规模的应用程序开发。在某些温和的假设下建立了全球融合到固定点,并在神经切线内核(NTK)情况下分析了快速线性收敛。关于卷积神经网络的广泛实验表明,与最先进的方法相比,Seng的竞争力。在使用Imagenet-1K的任务重新NET50上,Seng在41个时期内实现了75.9 \%TOP-1测试精度。分布式大批量训练的实验表明,缩放效率相当合理。

In this paper, we develop an efficient sketchy empirical natural gradient method (SENG) for large-scale deep learning problems. The empirical Fisher information matrix is usually low-rank since the sampling is only practical on a small amount of data at each iteration. Although the corresponding natural gradient direction lies in a small subspace, both the computational cost and memory requirement are still not tractable due to the high dimensionality. We design randomized techniques for different neural network structures to resolve these challenges. For layers with a reasonable dimension, sketching can be performed on a regularized least squares subproblem. Otherwise, since the gradient is a vectorization of the product between two matrices, we apply sketching on the low-rank approximations of these matrices to compute the most expensive parts. A distributed version of SENG is also developed for extremely large-scale applications. Global convergence to stationary points is established under some mild assumptions and a fast linear convergence is analyzed under the neural tangent kernel (NTK) case. Extensive experiments on convolutional neural networks show the competitiveness of SENG compared with the state-of-the-art methods. On the task ResNet50 with ImageNet-1k, SENG achieves 75.9\% Top-1 testing accuracy within 41 epochs. Experiments on the distributed large-batch training show that the scaling efficiency is quite reasonable.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源