马尔可夫决策过程动态的拥堵感知路径协调游戏

论文标题

马尔可夫决策过程动态的拥堵感知路径协调游戏

Congestion-aware path coordination game with Markov decision process dynamics

论文作者

Li, Sarah H. Q., Calderone, Dan, Acikmese, Behcet

论文摘要

受到机器人税，仓库管理和混合车辆路线问题引起的路径协调问题的启发，我们对马尔可夫决策过程动态下的一组响应随机需求作为拥堵游戏的一组异质参与者建模。玩家共享一个共同的州行动空间，但具有独特的过渡动态，每个玩家的独特成本是联合国家行动概率分布的{函数}。对于一类玩家成本功能，我们制定了特定于玩家的优化问题，证明了NASH平衡与潜在最小化问题的解决方案之间的等效性，并得出了解决NASH平衡的动态编程方法。我们将此游戏应用于对多代理路径协调的建模，并引入基于拥塞的成本功能，使玩家能够完成单个任务，同时避免与对手交通拥堵。最后，我们提出了一种学习算法，用于查找在玩家数量中具有线性复杂性的NASH平衡。我们在多机器人仓库\ Change {路径协调问题}上演示了我们的游戏模型，其中机器人自主检索并交付包装，同时避免了拥挤的路径。

Inspired by the path coordination problem arising from robo-taxis, warehouse management, and mixed-vehicle routing problems, we model a group of heterogeneous players responding to stochastic demands as a congestion game under Markov decision process dynamics. Players share a common state-action space but have unique transition dynamics, and each player's unique cost is a {function} of the joint state-action probability distribution. For a class of player cost functions, we formulate the player-specific optimization problem, prove the equivalence between the Nash equilibrium and the solution of a potential minimization problem, and derive dynamic programming approaches to solve the Nash equilibrium. We apply this game to model multi-agent path coordination and introduce congestion-based cost functions that enable players to complete individual tasks while avoiding congestion with their opponents. Finally, we present a learning algorithm for finding the Nash equilibrium that has linear complexity in the number of players. We demonstrate our game model on a multi-robot warehouse \change{path coordination problem}, in which robots autonomously retrieve and deliver packages while avoiding congested paths.

下载PDF全文

下载文献需遵守相关版权规定

论文标题