两人零和马尔可夫游戏的正规化梯度下降

论文标题

两人零和马尔可夫游戏的正规化梯度下降

Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games

论文作者

Zeng, Sihan, Doan, Thinh T., Romberg, Justin

论文摘要

我们研究了在两人零和马尔可夫游戏中找到NASH均衡的问题。由于其作为最小值优化程序的表述，解决该问题的自然方法是以交替的方式对每个玩家进行梯度下降/上升。但是，由于基本目标函数的非跨性别/非障碍性，对此方法的理论理解是有限的。在我们的论文中，我们考虑解决马尔可夫游戏的熵登记变体。正则化将结构引入优化景观中，从而使解决方案更加可识别，并允许更有效地解决问题。我们的主要贡献是表明，在正则化参数的正确选择下，梯度下降算法会收敛到原始未注册问题的NASH平衡。我们明确表征了我们算法的最后一个迭代的有限时间性能，该算法的梯度下降上升算法的现有收敛界限大大改善了而没有正则化。最后，我们通过数字模拟补充分析，以说明算法的加速收敛性。

We study the problem of finding the Nash equilibrium in a two-player zero-sum Markov game. Due to its formulation as a minimax optimization program, a natural approach to solve the problem is to perform gradient descent/ascent with respect to each player in an alternating fashion. However, due to the non-convexity/non-concavity of the underlying objective function, theoretical understandings of this method are limited. In our paper, we consider solving an entropy-regularized variant of the Markov game. The regularization introduces structure into the optimization landscape that make the solutions more identifiable and allow the problem to be solved more efficiently. Our main contribution is to show that under proper choices of the regularization parameter, the gradient descent ascent algorithm converges to the Nash equilibrium of the original unregularized problem. We explicitly characterize the finite-time performance of the last iterate of our algorithm, which vastly improves over the existing convergence bound of the gradient descent ascent algorithm without regularization. Finally, we complement the analysis with numerical simulations that illustrate the accelerated convergence of the algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题