论文标题
基于设定的Bellman操作员和随机游戏的NASH平衡的固定点
Bounding Fixed Points of Set-Based Bellman Operator and Nash Equilibria of Stochastic Games
论文作者
论文摘要
在马尔可夫决策过程(MDP)和随机游戏中遇到的不确定参数的动机,我们研究了在基于设定的框架下,参数不确定性对基于贝尔曼操作员的算法的影响。具体而言,我们首先考虑一个MDP家族,其中成本参数在给定的紧凑型集中;然后,我们定义一个在一组值函数上作用的钟手操作员,以在成本参数中所有可能的变化下产生新的值函数作为输出。我们通过证明它在完整的度量空间上是一定的,并探索与相应的MDP和随机游戏家族的关系,从而证明了这款基于集合的Bellman操作员的固定点。此外,我们证明了给定的间隔设置有限的成本参数,我们可以在最佳值函数的集合上形成精确的界限。最后,我们利用结果来绑定随机游戏中玩家的值函数轨迹。
Motivated by uncertain parameters encountered in Markov decision processes (MDPs) and stochastic games, we study the effect of parameter uncertainty on Bellman operator-based algorithms under a set-based framework. Specifically, we first consider a family of MDPs where the cost parameters are in a given compact set; we then define a Bellman operator acting on a set of value functions to produce a new set of value functions as the output under all possible variations in the cost parameter. We prove the existence of a fixed point of this set-based Bellman operator by showing that it is contractive on a complete metric space, and explore its relationship with the corresponding family of MDPs and stochastic games. Additionally, we show that given interval set bounded cost parameters, we can form exact bounds on the set of optimal value functions. Finally, we utilize our results to bound the value function trajectory of a player in a stochastic game.