平均付费对手stackelberg游戏中的脆弱性和鲁棒性

论文标题

平均付费对手stackelberg游戏中的脆弱性和鲁棒性

Fragility and Robustness in Mean-Payoff Adversarial Stackelberg Games

论文作者

Balachander, Mrudula, Guha, Shibashis, Raskin, Jean-François

论文摘要

两人的平均付款赛车手游戏是由领导者（播放器0）和追随者（播放器1）在双重加权图上播放的非零和无限持续游戏。这样的游戏是依次玩的：首先，领导者宣布她的策略，其次，追随者选择了他的最佳响应。如果我们不能强加追随者选择哪种最佳响应，那么我们说，尽管战略性却是对领导者的对抗性。领导者可以在这个非零和游戏中获得的最大价值称为游戏的对抗性stackelberg值（ASV）。我们研究了这些游戏中领导者策略的鲁棒性，以抵制两种类型的偏差：（i）对不精确进行建模 - 游戏领域边缘的权重可能不是完全正确的，它们可能与正确的差异。（ii）次优的响应 - 追随者可能会播放Epsilon-Timimal最佳回答，而不是完美的最佳回答。首先，我们表明，如果游戏为零和零，则可以在非零和情况下保证鲁棒性，而ASV的最佳策略是脆弱的。其次，我们提供了一个解决方案概念，以获取领导者的策略，这些策略既可以对不精确进行建模，又是对追随者的Epsilon最佳响应，并研究了与此解决方案概念相关的几种属性和算法问题。

Two-player mean-payoff Stackelberg games are nonzero-sum infinite duration games played on a bi-weighted graph by Leader (Player 0) and Follower (Player 1). Such games are played sequentially: first, Leader announces her strategy, second, Follower chooses his best-response. If we cannot impose which best-response is chosen by Follower, we say that Follower, though strategic, is adversarial towards Leader. The maximal value that Leader can get in this nonzero-sum game is called the adversarial Stackelberg value (ASV) of the game. We study the robustness of strategies for Leader in these games against two types of deviations: (i) Modeling imprecision - the weights on the edges of the game arena may not be exactly correct, they may be delta-away from the right one. (ii) Sub-optimal response - Follower may play epsilon-optimal best-responses instead of perfect best-responses. First, we show that if the game is zero-sum then robustness is guaranteed while in the nonzero-sum case, optimal strategies for ASV are fragile. Second, we provide a solution concept to obtain strategies for Leader that are robust to both modeling imprecision, and as well as to the epsilon-optimal responses of Follower, and study several properties and algorithmic problems related to this solution concept.

下载PDF全文

下载文献需遵守相关版权规定

论文标题