论文标题
在非常大型游戏中的小型纳什平衡证书
Small Nash Equilibrium Certificates in Very Large Games
论文作者
论文摘要
在许多游戏设置中,游戏没有明确给出,但只能通过玩游戏。尽管在这种情况下进行了令人印象深刻的示范,但先前的技术尚未提供安全保证,也就是说,可以保证计算出的策略的游戏理论可利用性。在本文中,我们介绍了一种方法,该方法表明可以在此类设置中提供可剥削性保证,而无需探索整个游戏。我们引入了广泛形式近似NASH平衡的证书的概念。为了验证证书,我们给出了一种算法,该算法按时间线性运行,以证书的大小而不是整个游戏的大小。在零和游戏中,我们进一步显示,可以使用任何标准的游戏算法(例如,使用线性程序或反事实遗憾的最小化)来计算最佳证书 - 鉴于到目前为止的探索 - 可以计算出最佳证书。但是,与正常形式或完美信息的情况不同,我们表明某些广泛形式游戏的家属即使对游戏结构做出了极好的假设,也没有较小的近似证书。尽管有这个困难,但我们在实验上发现,很小的证书,即使是确切的证书,通常都存在于大型甚至无限游戏中。总体而言,我们的方法使人们能够在提供可剥削性保证的同时尝试一个人喜欢的探索策略,从而将勘探策略与均衡过程中解耦。
In many game settings, the game is not explicitly given but is only accessible by playing it. While there have been impressive demonstrations in such settings, prior techniques have not offered safety guarantees, that is, guarantees on the game-theoretic exploitability of the computed strategies. In this paper we introduce an approach that shows that it is possible to provide exploitability guarantees in such settings without ever exploring the entire game. We introduce a notion of a certificate of an extensive-form approximate Nash equilibrium. For verifying a certificate, we give an algorithm that runs in time linear in the size of the certificate rather than the size of the whole game. In zero-sum games, we further show that an optimal certificate -- given the exploration so far -- can be computed with any standard game-solving algorithm (e.g., using a linear program or counterfactual regret minimization). However, unlike in the cases of normal form or perfect information, we show that certain families of extensive-form games do not have small approximate certificates, even after making extremely nice assumptions on the structure of the game. Despite this difficulty, we find experimentally that very small certificates, even exact ones, often exist in large and even in infinite games. Overall, our approach enables one to try one's favorite exploration strategies while offering exploitability guarantees, thereby decoupling the exploration strategy from the equilibrium-finding process.