改组型梯度方法的统一收敛分析

论文标题

改组型梯度方法的统一收敛分析

A Unified Convergence Analysis for Shuffling-Type Gradient Methods

论文作者

Nguyen, Lam M., Tran-Dinh, Quoc, Phan, Dzung T., Nguyen, Phuong Ha, van Dijk, Marten

论文摘要

在本文中，我们为一类通用洗牌型梯度方法提出了一个统一的收敛分析，以解决有限的和-AM的优化问题。我们的分析可用于任何无需替代策略的采样，并涵盖许多已知的变体，例如随机改组，确定性或随机单个置换，以及环状和增量梯度方案。我们专注于两种不同的设置：强烈凸和非凸问题，但也讨论了非严重的凸情况。我们的主要贡献包括新的非反应和渐近收敛速率，用于非凸和凸环中的各种洗牌型梯度方法。我们还研究具有不同学习率和模型假设的统一随机洗牌变体。尽管我们在非凸案中的速率是新的，并且比在标准假设下的现有作品有了显着提高，但强烈凸的速率与本文之前的现有最著名速率相匹配，直至恒定因素，而不施加有限的梯度条件。最后，我们通过两个数值示例从经验上说明了我们的理论结果：非凸逻辑回归和神经网络训练示例。作为副产品，我们的结果提出了一些适当的选择，以降低某些洗牌变体中的学习率。

In this paper, we propose a unified convergence analysis for a class of generic shuffling-type gradient methods for solving finite-sum optimization problems. Our analysis works with any sampling without replacement strategy and covers many known variants such as randomized reshuffling, deterministic or randomized single permutation, and cyclic and incremental gradient schemes. We focus on two different settings: strongly convex and nonconvex problems, but also discuss the non-strongly convex case. Our main contribution consists of new non-asymptotic and asymptotic convergence rates for a wide class of shuffling-type gradient methods in both nonconvex and convex settings. We also study uniformly randomized shuffling variants with different learning rates and model assumptions. While our rate in the nonconvex case is new and significantly improved over existing works under standard assumptions, the rate on the strongly convex one matches the existing best-known rates prior to this paper up to a constant factor without imposing a bounded gradient condition. Finally, we empirically illustrate our theoretical results via two numerical examples: nonconvex logistic regression and neural network training examples. As byproducts, our results suggest some appropriate choices for diminishing learning rates in certain shuffling variants.

下载PDF全文

下载文献需遵守相关版权规定

论文标题