冷冻：在时间和空间上以分数挤压钻头节省，以进行有效的DNN培训

论文标题

冷冻：在时间和空间上以分数挤压钻头节省，以进行有效的DNN培训

FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training

论文作者

Fu, Yonggan, You, Haoran, Zhao, Yang, Wang, Yue, Li, Chaojian, Gopalakrishnan, Kailash, Wang, Zhangyang, Lin, Yingyan Celine

论文摘要

最近在深层神经网络（DNN）中的突破促进了对具有现场学习的智能边缘设备的巨大需求，而这种系统的实际实现仍然是一个挑战，这是由于边缘可用的有限的资源以及对最先进（SOTA）DNNS所需的大规模培训成本。由于降低精度是提高训练时间/能源效率的最有效旋钮之一，因此对低精确的DNN培训的兴趣越来越大。在本文中，我们从正交方向进行探索：如何从最冗余的位水平逐步挤出更多的训练成本，沿训练轨迹逐渐沿训练轨迹和每个输入动态。 Specifically, we propose FracTrain that integrates (i) progressive fractional quantization which gradually increases the precision of activations, weights, and gradients that will not reach the precision of SOTA static quantized DNN training until the final training stage, and (ii) dynamic fractional quantization which assigns precisions to both the activations and gradients of each layer in an input-adaptive manner, for only "fractionally" updating layer parameters.广泛的仿真和消融研究（六个模型，四个数据集和三个培训设置，包括标准，适应和微调）证明了裂纹在降低计算成本和硬件征服能量/DNN培训的延迟时的有效性，同时实现可比较或更好的（-0.12％〜+1.87％）精度。例如，与最佳SOTA基线相比，Fractrain在CIFAR-10上进行培训Resnet-74分别达到77.6％和53.5％的计算成本和培训延迟节省，同时达到了可比（-0.07％）的精度。我们的代码可在以下网址提供：https：//github.com/rice-eic/fractrain。

Recent breakthroughs in deep neural networks (DNNs) have fueled a tremendous demand for intelligent edge devices featuring on-site learning, while the practical realization of such systems remains a challenge due to the limited resources available at the edge and the required massive training costs for state-of-the-art (SOTA) DNNs. As reducing precision is one of the most effective knobs for boosting training time/energy efficiency, there has been a growing interest in low-precision DNN training. In this paper, we explore from an orthogonal direction: how to fractionally squeeze out more training cost savings from the most redundant bit level, progressively along the training trajectory and dynamically per input. Specifically, we propose FracTrain that integrates (i) progressive fractional quantization which gradually increases the precision of activations, weights, and gradients that will not reach the precision of SOTA static quantized DNN training until the final training stage, and (ii) dynamic fractional quantization which assigns precisions to both the activations and gradients of each layer in an input-adaptive manner, for only "fractionally" updating layer parameters. Extensive simulations and ablation studies (six models, four datasets, and three training settings including standard, adaptation, and fine-tuning) validate the effectiveness of FracTrain in reducing computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%~+1.87%) accuracy. For example, when training ResNet-74 on CIFAR-10, FracTrain achieves 77.6% and 53.5% computational cost and training latency savings, respectively, compared with the best SOTA baseline, while achieving a comparable (-0.07%) accuracy. Our codes are available at: https://github.com/RICE-EIC/FracTrain.

下载PDF全文

下载文献需遵守相关版权规定

论文标题