ACMO：随机优化的角度校准力矩方法

论文标题

ACMO：随机优化的角度校准力矩方法

ACMo: Angle-Calibrated Moment Methods for Stochastic Optimization

论文作者

Huang, Xunpeng, Xu, Runxin, Zhou, Hao, Wang, Zhe, Liu, Zhengyang, Li, Lei

论文摘要

由于其简单性和出色的推广能力，随机梯度下降（SGD）仍然是最广泛使用的优化方法，尽管其收敛缓慢。同时，自适应方法吸引了优化和机器学习社区的关注，包括终身信息的杠杆作用以及深刻而基本的数学理论。在两全其美的优化领域中，最好的问题是机器学习领域中最激动人心的问题。沿着这条线，我们从新颖的角度重新审视了现有的自适应梯度方法，对第二时刻的理解令人耳目一新。我们的新观点使我们能够将第二矩的属性附加到第一刻的迭代，并提出一种新颖的第一瞬间优化器，\ emph {Angle-Calationated Mongter Moxt}（\ Method）。我们的理论结果表明，\方法能够达到与主流自适应方法相同的收敛速率。此外，对简历和NLP任务进行的广泛实验表明，\方法与SOTA ADAM型优化器具有可比的收敛性，并且在大多数情况下都获得了更好的概括性能。

Due to its simplicity and outstanding ability to generalize, stochastic gradient descent (SGD) is still the most widely used optimization method despite its slow convergence. Meanwhile, adaptive methods have attracted rising attention of optimization and machine learning communities, both for the leverage of life-long information and for the profound and fundamental mathematical theory. Taking the best of both worlds is the most exciting and challenging question in the field of optimization for machine learning. Along this line, we revisited existing adaptive gradient methods from a novel perspective, refreshing understanding of second moments. Our new perspective empowers us to attach the properties of second moments to the first moment iteration, and to propose a novel first moment optimizer, \emph{Angle-Calibrated Moment method} (\method). Our theoretical results show that \method is able to achieve the same convergence rate as mainstream adaptive methods. Furthermore, extensive experiments on CV and NLP tasks demonstrate that \method has a comparable convergence to SOTA Adam-type optimizers, and gains a better generalization performance in most cases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题