可扩展的基于模型的强化学习

论文标题

可扩展的基于模型的强化学习

Scalable Multi-Agent Model-Based Reinforcement Learning

论文作者

Egorov, Vladimir, Shpilman, Aleksei

论文摘要

最近的多项式强化学习（MARL）文献主要集中在分散执行（CTDE）范式的集中培训上。 CTDE是合作环境和混合环境的主要方法，因为其能力有效地训练分散的政策。尽管在混合环境中，代理的全部自主权可能是理想的结果，但合作环境使代理可以共享信息以促进协调。利用这种技术的方法通常被称为通信方法，因为代理的完全自主权被妥协以提高性能。尽管沟通方法表现出了令人印象深刻的结果，但在培训阶段，它们并未充分利用此其他信息。在本文中，我们提出了一种称为Mamba的新方法，该方法利用基于模型的增强学习（MBRL）进一步利用合作环境中的集中式培训。我们认为，代理之间的沟通足以维持每个代理商在执行阶段的世界模型，而假想的推出可以用于训练，从而消除了与环境互动的必要性。这些属性产生的样品有效算法可以通过代理数量优雅地缩放。我们从经验上证实，与无模型的SMAC和Flatland领域中的无模型最新方法相比，Mamba在将与环境的相互作用的数量减少到数量级的同时，在将与环境的相互作用的数量减少到数量级。

Recent Multi-Agent Reinforcement Learning (MARL) literature has been largely focused on Centralized Training with Decentralized Execution (CTDE) paradigm. CTDE has been a dominant approach for both cooperative and mixed environments due to its capability to efficiently train decentralized policies. While in mixed environments full autonomy of the agents can be a desirable outcome, cooperative environments allow agents to share information to facilitate coordination. Approaches that leverage this technique are usually referred as communication methods, as full autonomy of agents is compromised for better performance. Although communication approaches have shown impressive results, they do not fully leverage this additional information during training phase. In this paper, we propose a new method called MAMBA which utilizes Model-Based Reinforcement Learning (MBRL) to further leverage centralized training in cooperative environments. We argue that communication between agents is enough to sustain a world model for each agent during execution phase while imaginary rollouts can be used for training, removing the necessity to interact with the environment. These properties yield sample efficient algorithm that can scale gracefully with the number of agents. We empirically confirm that MAMBA achieves good performance while reducing the number of interactions with the environment up to an orders of magnitude compared to Model-Free state-of-the-art approaches in challenging domains of SMAC and Flatland.

下载PDF全文

下载文献需遵守相关版权规定

论文标题