批次大小的极限

论文标题

批次大小的极限

The Limit of the Batch Size

论文作者

You, Yang, Wang, Yuhui, Zhang, Huan, Zhang, Zhao, Demmel, James, Hsieh, Cho-Jui

论文摘要

大批量培训是当前分布式深度学习系统的有效方法。它使研究人员能够将Imagenet/Resnet-50培训从29小时减少到1分钟左右。在本文中，我们专注于研究批处理尺寸的极限。我们认为，它可以为AI超级计算机和算法设计师提供指导。我们为分步比较提供详细的数值优化指令。此外，重要的是要了解大型批次培训的概括和优化性能。 Hoffer等。将“超慢扩散”理论引入了大批量训练。然而，我们的实验与Hoffer等人的结论显示了矛盾的结果。我们提供了全面的实验结果和详细的分析，以研究批次尺寸缩放和“超慢扩散”理论的局限性。我们第一次将ImageNet上的批量尺寸扩展到至少比所有以前的工作大的幅度，并提供有关此设置下许多最新优化方案的性能的详细研究。我们提出了一种优化配方，与基线相比，能够将TOP-1测试准确性提高18％。

Large-batch training is an efficient approach for current distributed deep learning systems. It has enabled researchers to reduce the ImageNet/ResNet-50 training from 29 hours to around 1 minute. In this paper, we focus on studying the limit of the batch size. We think it may provide a guidance to AI supercomputer and algorithm designers. We provide detailed numerical optimization instructions for step-by-step comparison. Moreover, it is important to understand the generalization and optimization performance of huge batch training. Hoffer et al. introduced "ultra-slow diffusion" theory to large-batch training. However, our experiments show contradictory results with the conclusion of Hoffer et al. We provide comprehensive experimental results and detailed analysis to study the limitations of batch size scaling and "ultra-slow diffusion" theory. For the first time we scale the batch size on ImageNet to at least a magnitude larger than all previous work, and provide detailed studies on the performance of many state-of-the-art optimization schemes under this setting. We propose an optimization recipe that is able to improve the top-1 test accuracy by 18% compared to the baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题