方差减少了用于分散学习的随机准Newton方法：第二部分

论文标题

方差减少了用于分散学习的随机准Newton方法：第二部分

Variance-Reduced Stochastic Quasi-Newton Methods for Decentralized Learning: Part II

论文作者

Zhang, Jiaojiao, Liu, Huikang, So, Anthony Man-Cho, Ling, Qing

论文摘要

在这项工作的第一部分中，我们提出了一个分散的随机准牛顿方法的一般框架，该方法在假设局部Hessian逆近似值具有限制的阳性特征值的假设下，线性收敛到最佳解决方案。在第二部分中，我们指定了两种完全分散的随机化准Newton方法，抑制了正规化的有限记忆DFP（Davidon-Fletcher-Powell）和抑制有限的内存BFGS（Broyden-fletcher-fletcher-goldfarb-Shanno），以局部构建此类Hessian逆向近似近似或交流。这两种方法都使用$ m $的固定移动窗口，过去局部梯度近似值和本地决策变量来适应构建带有有界特征值的正定确定的Hessian逆近似近似值，从而满足了线性收敛部分中的假设。对于拟议的抑制正规化的有限内存DFP，添加了正则化项以提高性能。对于建议的阻尼有限的内存BFG，应用了两循环递归，从而导致较低的存储和计算复杂性。数值实验表明，所提出的准Newton方法比现有的分散随机一阶算法要快得多。

In Part I of this work, we have proposed a general framework of decentralized stochastic quasi-Newton methods, which converge linearly to the optimal solution under the assumption that the local Hessian inverse approximations have bounded positive eigenvalues. In Part II, we specify two fully decentralized stochastic quasi-Newton methods, damped regularized limited-memory DFP (Davidon-Fletcher-Powell) and damped limited-memory BFGS (Broyden-Fletcher-Goldfarb-Shanno), to locally construct such Hessian inverse approximations without extra sampling or communication. Both of the methods use a fixed moving window of $M$ past local gradient approximations and local decision variables to adaptively construct positive definite Hessian inverse approximations with bounded eigenvalues, satisfying the assumption in Part I for the linear convergence. For the proposed damped regularized limited-memory DFP, a regularization term is added to improve the performance. For the proposed damped limited-memory BFGS, a two-loop recursion is applied, leading to low storage and computation complexity. Numerical experiments demonstrate that the proposed quasi-Newton methods are much faster than the existing decentralized stochastic first-order algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题