马尔可夫决策过程中的多代理多目标路径计划

论文标题

马尔可夫决策过程中的多代理多目标路径计划

Multi-agent Multi-target Path Planning in Markov Decision Processes

论文作者

Nawaz, Farhad, Ornik, Melkior

论文摘要

自主系统的任务通常要求代理在复杂的操作条件下访问多个目标。这项工作考虑了马尔可夫决策过程中的非交流代理团队（MDP）在最短时间内访问一组目标的问题。单一代理问题至少是通过将其减少到哈密顿路径问题来完成的。我们首先讨论基于目标状态数量指数的Bellman最优方程的最佳算法。然后，我们通过呈现在每个时间步骤中是多项式的次优算法来权衡时间复杂性。我们证明所提出的算法为某些类别的MDP生成了最佳策略。将我们的程序扩展到多代理案例，我们提出了一种目标分区算法，该算法大致将访问目标的预期时间最小化。我们证明我们的算法为聚类目标方案生成了最佳分区。我们介绍了受海洋动力学启发的随机MDP和GRIDWORLD环境的性能。我们表明，我们的算法比最佳过程快得多，并且比当前可用的启发式算法要快。

Missions for autonomous systems often require agents to visit multiple targets in complex operating conditions. This work considers the problem of visiting a set of targets in minimum time by a team of non-communicating agents in a Markov decision process (MDP). The single-agent problem is at least NP-complete by reducing it to a Hamiltonian path problem. We first discuss an optimal algorithm based on Bellman's optimality equation that is exponential in the number of target states. Then, we trade-off optimality for time complexity by presenting a suboptimal algorithm that is polynomial at each time step. We prove that the proposed algorithm generates optimal policies for certain classes of MDPs. Extending our procedure to the multi-agent case, we propose a target partitioning algorithm that approximately minimizes the expected time to visit the targets. We prove that our algorithm generates optimal partitions for clustered target scenarios. We present the performance of our algorithms on random MDPs and gridworld environments inspired by ocean dynamics. We show that our algorithms are much faster than the optimal procedure and more optimal than the currently available heuristic.

下载PDF全文

下载文献需遵守相关版权规定

论文标题