压缩SGD的自适应阶梯尺寸方法

论文标题

压缩SGD的自适应阶梯尺寸方法

Adaptive Step-Size Methods for Compressed SGD

论文作者

Subramaniam, Adarsh M., Magesh, Akshayaa, Veeravalli, Venugopal V.

论文摘要

最近已经提出了压缩随机梯度下降（SGD）算法，以解决分布式和分散的优化问题中的通信瓶颈，例如在联合机器学习中出现的瓶颈。现有的压缩SGD算法假定使用非自适应的台阶尺寸（恒定或减小）来提供理论收敛保证。通常，在实践中对数据集和学习算法进行了微调，以提供良好的经验性能。在许多学习方案中，这种微调可能是不切实际的，因此，使用自适应步骤尺寸研究压缩SGD是很感兴趣的。由SGD在未压缩环境中有效训练神经网络的自适应阶梯尺寸方法的先前工作的动机，我们为压缩SGD开发了一种自适应级尺寸方法。特别是，我们在压缩SGD中引入了一种缩放技术，用于在插值条件下，在凸面和强生产条件下，用于建立凸平平和强凸 - 平滑目标的订单优势收敛速率。我们还通过模拟示例显示，如果没有这种缩放，算法就无法收敛。我们介绍了有关真实世界数据集的深神经网络的实验结果，并将我们提出的算法的性能与先前提出的文献压缩SGD方法进行了比较，并在Resnet-18，Resnet-34，Resnet-34和Densenet体系结构上的CIFAR-100和CIFAR-100和CIFAR-10和CIFAR-10和CIFAR-10数据集的性能提高了。

Compressed Stochastic Gradient Descent (SGD) algorithms have been recently proposed to address the communication bottleneck in distributed and decentralized optimization problems, such as those that arise in federated machine learning. Existing compressed SGD algorithms assume the use of non-adaptive step-sizes(constant or diminishing) to provide theoretical convergence guarantees. Typically, the step-sizes are fine-tuned in practice to the dataset and the learning algorithm to provide good empirical performance. Such fine-tuning might be impractical in many learning scenarios, and it is therefore of interest to study compressed SGD using adaptive step-sizes. Motivated by prior work on adaptive step-size methods for SGD to train neural networks efficiently in the uncompressed setting, we develop an adaptive step-size method for compressed SGD. In particular, we introduce a scaling technique for the descent step in compressed SGD, which we use to establish order-optimal convergence rates for convex-smooth and strong convex-smooth objectives under an interpolation condition and for non-convex objectives under a strong growth condition. We also show through simulation examples that without this scaling, the algorithm can fail to converge. We present experimental results on deep neural networks for real-world datasets, and compare the performance of our proposed algorithm with previously proposed compressed SGD methods in literature, and demonstrate improved performance on ResNet-18, ResNet-34 and DenseNet architectures for CIFAR-100 and CIFAR-10 datasets at various levels of compression.

下载PDF全文

下载文献需遵守相关版权规定

论文标题