论文标题
通过平衡探索探索折衷的动态稀疏训练
Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off
论文作者
论文摘要
深度神经网络(DNN)的过度参数已显示出许多应用的高预测准确性。尽管有效,但大量参数阻碍了其在资源有限的设备上的普及,并且具有巨大的环境影响。稀疏训练(使用固定数量的每次迭代中的非零重量)可以通过降低模型大小来大大减轻培训成本。但是,现有的稀疏训练方法主要使用基于随机或基于贪婪的掉落策略,从而导致局部最低和较低的精度。在这项工作中,我们将动态稀疏训练视为稀疏连接搜索问题,并设计了开发和探索习得功能,以逃避当地的Optima和鞍点。我们进一步设计了采集函数,并为提出的方法提供理论保证并阐明其收敛性。实验结果表明,通过我们所提出的方法获得的稀疏模型(最多98 \%的稀疏性)优于SOTA稀疏训练方法,这些方法在各种深度学习任务上的稀疏训练方法。在VGG-19 / CIFAR-100,RESNET-50 / CIFAR-10,RESNET-50 / CIFAR-100上,我们的方法的精度比密集模型更高。在Resnet-50 / Imagenet上,与SOTA稀疏训练方法相比,所提出的方法的精度提高了8.2 \%。
Over-parameterization of deep neural networks (DNNs) has shown high prediction accuracy for many applications. Although effective, the large number of parameters hinders its popularity on resource-limited devices and has an outsize environmental impact. Sparse training (using a fixed number of nonzero weights in each iteration) could significantly mitigate the training costs by reducing the model size. However, existing sparse training methods mainly use either random-based or greedy-based drop-and-grow strategies, resulting in local minimal and low accuracy. In this work, we consider the dynamic sparse training as a sparse connectivity search problem and design an exploitation and exploration acquisition function to escape from local optima and saddle points. We further design an acquisition function and provide the theoretical guarantees for the proposed method and clarify its convergence property. Experimental results show that sparse models (up to 98\% sparsity) obtained by our proposed method outperform the SOTA sparse training methods on a wide variety of deep learning tasks. On VGG-19 / CIFAR-100, ResNet-50 / CIFAR-10, ResNet-50 / CIFAR-100, our method has even higher accuracy than dense models. On ResNet-50 / ImageNet, the proposed method has up to 8.2\% accuracy improvement compared to SOTA sparse training methods.