研究最小学习者的概括是什么？

论文标题

研究最小学习者的概括是什么？

What is a Good Metric to Study Generalization of Minimax Learners?

论文作者

Ozdaglar, Asuman, Pattathil, Sarath, Zhang, Jiawei, Zhang, Kaiqing

论文摘要

Minimax优化已成为许多机器学习（ML）问题的骨干。尽管优化算法的收敛行为已在Minimax设置中进行了广泛的研究，但它们在随机最小优化问题中保证了它们的概括，即，在经验数据上训练的解决方案如何在看不见的测试数据上执行的解决方案是相对均相对疏忽了的。一个基本问题仍然难以捉摸：研究最小学习者的概括是什么？在本文中，我们旨在首先表明原始风险是研究最小化问题的概括的普遍指标，该指标最近也被用于研究minimax of的概括，在简单的示例中失败了。因此，我们提出了一个新的指标来研究最小学习者的概括：原始的差距定义为原始风险与其最小值之间所有模型之间的差异，以避免问题。接下来，我们在非convex-concave设置中得出原始差距的概括误差界限。作为我们分析的副产品，我们还解决了两个空旷的问题：建立原始风险和原始偶尔风险的概括误差范围，这是另一个现有的指标，只有在全球鞍点存在强大的意义上，即没有强大的凹陷或假设最大化和期望可以相互交换的情况下，只有在强大的含义上存在明确的含义，同时需要这些假设，这是在这些假设中所需要的。最后，我们利用这一新指标比较了两种流行算法的概括行为 - 梯度下降（GDA）和梯度下降 - 最大 - 最小值（GDMAX）中的随机最小值优化。

Minimax optimization has served as the backbone of many machine learning (ML) problems. Although the convergence behavior of optimization algorithms has been extensively studied in the minimax settings, their generalization guarantees in stochastic minimax optimization problems, i.e., how the solution trained on empirical data performs on unseen testing data, have been relatively underexplored. A fundamental question remains elusive: What is a good metric to study generalization of minimax learners? In this paper, we aim to answer this question by first showing that primal risk, a universal metric to study generalization in minimization problems, which has also been adopted recently to study generalization in minimax ones, fails in simple examples. We thus propose a new metric to study generalization of minimax learners: the primal gap, defined as the difference between the primal risk and its minimum over all models, to circumvent the issues. Next, we derive generalization error bounds for the primal gap in nonconvex-concave settings. As byproducts of our analysis, we also solve two open questions: establishing generalization error bounds for primal risk and primal-dual risk, another existing metric that is only well-defined when the global saddle-point exists, in the strong sense, i.e., without strong concavity or assuming that the maximization and expectation can be interchanged, while either of these assumptions was needed in the literature. Finally, we leverage this new metric to compare the generalization behavior of two popular algorithms -- gradient descent-ascent (GDA) and gradient descent-max (GDMax) in stochastic minimax optimization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题