动量跟踪：在异质数据上分散深度学习的动量加速度

论文标题

动量跟踪：在异质数据上分散深度学习的动量加速度

Momentum Tracking: Momentum Acceleration for Decentralized Deep Learning on Heterogeneous Data

论文作者

Takezawa, Yuki, Bao, Han, Niwa, Kenta, Sato, Ryoma, Yamada, Makoto

论文摘要

具有动量的SGD是改善神经网络性能的关键组成部分之一。对于分散学习，使用动量（DSGDM）分布了使用动量的直接方法。但是，当数据分布在统计上异质上时，DSGDM的性能比DSGD差。最近，一些研究解决了这个问题和提出的动量方法，这些方法比DSGDM更适合数据异质性，尽管它们的收敛速率仍取决于数据分布在异质性时的数据异质性和恶化。在这项研究中，我们提出了动量跟踪，这是一种具有动量的方法，其收敛速率被证明与数据异质性无关。更具体地说，我们分析了目标函数为非凸的设置中的动量跟踪的收敛速率，并且使用了随机梯度。然后，我们确定在[0，1）$中的任何动量系数$β\中与数据异质性无关。通过实验，我们证明，与现有的分散学习方法相比，动量跟踪对数据异质性更强大，并且当数据分布是异质的时，可以始终如一地优于这些现有方法。

SGD with momentum is one of the key components for improving the performance of neural networks. For decentralized learning, a straightforward approach using momentum is Distributed SGD (DSGD) with momentum (DSGDm). However, DSGDm performs worse than DSGD when the data distributions are statistically heterogeneous. Recently, several studies have addressed this issue and proposed methods with momentum that are more robust to data heterogeneity than DSGDm, although their convergence rates remain dependent on data heterogeneity and deteriorate when the data distributions are heterogeneous. In this study, we propose Momentum Tracking, which is a method with momentum whose convergence rate is proven to be independent of data heterogeneity. More specifically, we analyze the convergence rate of Momentum Tracking in the setting where the objective function is non-convex and the stochastic gradient is used. Then, we identify that it is independent of data heterogeneity for any momentum coefficient $β\in [0, 1)$. Through experiments, we demonstrate that Momentum Tracking is more robust to data heterogeneity than the existing decentralized learning methods with momentum and can consistently outperform these existing methods when the data distributions are heterogeneous.

下载PDF全文

下载文献需遵守相关版权规定

论文标题