论文标题
梯度放大:训练深神经网络的有效方法
Gradient Amplification: An efficient way to train deep neural networks
论文作者
论文摘要
在深度神经网络中,提高深度学习模型的表现并减少训练时间是持续的挑战。提出了几种解决这些挑战的方法,其中之一是增加神经网络的深度。这种更深的网络不仅增加了训练时间,而且还会在培训时遇到消失的梯度问题。在这项工作中,我们提出了梯度放大方法,用于培训深度学习模型,以防止消失的梯度,并制定培训策略,以启用或禁用几个具有不同学习率的时代的梯度放大方法。我们在VGG-19和Resnet(Resnet-18和Resnet-34)模型上执行实验,并详细研究放大参数对这些模型的影响。我们提出的方法即使在更高的学习率下也可以提高这些深度学习模型的性能,从而使这些模型可以通过减少的训练时间实现更高的性能。
Improving performance of deep learning models and reducing their training times are ongoing challenges in deep neural networks. There are several approaches proposed to address these challenges one of which is to increase the depth of the neural networks. Such deeper networks not only increase training times, but also suffer from vanishing gradients problem while training. In this work, we propose gradient amplification approach for training deep learning models to prevent vanishing gradients and also develop a training strategy to enable or disable gradient amplification method across several epochs with different learning rates. We perform experiments on VGG-19 and resnet (Resnet-18 and Resnet-34) models, and study the impact of amplification parameters on these models in detail. Our proposed approach improves performance of these deep learning models even at higher learning rates, thereby allowing these models to achieve higher performance with reduced training time.