自适应制动以减轻梯度延迟

论文标题

自适应制动以减轻梯度延迟

Adaptive Braking for Mitigating Gradient Delay

论文作者

Venigalla, Abhinav, Kosson, Atli, Chiley, Vitaliy, Köster, Urs

论文摘要

通过使用多个同步工人并行计算梯度更新，通常会加速神经网络训练。异步方法以引入梯度延迟为代价删除同步开销并改善硬件利用率，这阻碍了优化，并可能导致最终模型性能降低。我们介绍了自适应制动（AB），这是一种基于动量的优化器的修改，可减轻梯度延迟的影响。 AB根据梯度和速度的比对动态缩放梯度。这可能会沿损耗表面的高曲率方向衰减振荡，稳定和加速异步训练。我们证明，在SGD上使用动量在CIFAR-10和Imagenet-1K上使用AB，并在最终测试准确性中降低了最小的下降，并具有延迟$ D \ geq $ 32更新步骤。

Neural network training is commonly accelerated by using multiple synchronized workers to compute gradient updates in parallel. Asynchronous methods remove synchronization overheads and improve hardware utilization at the cost of introducing gradient delay, which impedes optimization and can lead to lower final model performance. We introduce Adaptive Braking (AB), a modification for momentum-based optimizers that mitigates the effects of gradient delay. AB dynamically scales the gradient based on the alignment of the gradient and the velocity. This can dampen oscillations along high curvature directions of the loss surface, stabilizing and accelerating asynchronous training. We show that applying AB on top of SGD with momentum enables training ResNets on CIFAR-10 and ImageNet-1k with delays $D \geq$ 32 update steps with minimal drop in final test accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题