在受约束的马鞍点优化中的线性最后近题收敛

论文标题

在受约束的马鞍点优化中的线性最后近题收敛

Linear Last-iterate Convergence in Constrained Saddle-point Optimization

论文作者

Wei, Chen-Yu, Lee, Chung-Wei, Zhang, Mengxiao, Luo, Haipeng

论文摘要

乐观的梯度下降（OGDA）和鞍点优化的乐观乘量更新（OMWU），由于其良好的最后近期收敛，因此受到了越来越多的关注。但是，他们对简单概率单纯性游戏的行为仍未完全理解 - 先前的分析缺乏明确的收敛率，仅适用于指数级的学习率，或者需要其他假设，例如最佳解决方案的独特性。在这项工作中，我们在受约束的环境中大大扩展了对OGDA和OMWU的最后近期收敛的理解。具体而言，对于单纯胶的双线性游戏中的OMWU，我们表明，当平衡是独特的，线性的最后一介质收敛是通过学习率设置为通用常数的，在相同假设下（Daskalakis＆Panageas，2019b）的结果。然后，我们通过引入充分的条件，将结果显着扩展到了预测的OGDA算法的更一般目标和可行的集合，在该条件下，OGDA表现出具有恒定学习率的混凝土最后一介质收敛速率，其值仅取决于目标功能的平滑度。我们表明，在任何多层上，双线性游戏都满足了这种情况，并且即使没有独特的平衡假设，OGDA也会快速收敛。我们的病情也适用于强烈convex-rong-concave功能，从而恢复了（Hsieh等，2019）的结果。最后，我们提供实验结果以进一步支持我们的理论。

Optimistic Gradient Descent Ascent (OGDA) and Optimistic Multiplicative Weights Update (OMWU) for saddle-point optimization have received growing attention due to their favorable last-iterate convergence. However, their behaviors for simple bilinear games over the probability simplex are still not fully understood - previous analysis lacks explicit convergence rates, only applies to an exponentially small learning rate, or requires additional assumptions such as the uniqueness of the optimal solution. In this work, we significantly expand the understanding of last-iterate convergence for OGDA and OMWU in the constrained setting. Specifically, for OMWU in bilinear games over the simplex, we show that when the equilibrium is unique, linear last-iterate convergence is achieved with a learning rate whose value is set to a universal constant, improving the result of (Daskalakis & Panageas, 2019b) under the same assumption. We then significantly extend the results to more general objectives and feasible sets for the projected OGDA algorithm, by introducing a sufficient condition under which OGDA exhibits concrete last-iterate convergence rates with a constant learning rate whose value only depends on the smoothness of the objective function. We show that bilinear games over any polytope satisfy this condition and OGDA converges exponentially fast even without the unique equilibrium assumption. Our condition also holds for strongly-convex-strongly-concave functions, recovering the result of (Hsieh et al., 2019). Finally, we provide experimental results to further support our theory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题