论文标题
与图形卷积交流的反事实多代理增强学习
Counterfactual Multi-Agent Reinforcement Learning with Graph Convolution Communication
论文作者
论文摘要
我们考虑了一个完全合作的多代理系统,在该系统中,代理在部分可观察的环境中合作以最大化系统的效用。我们建议多代理系统必须具有(1)传达和理解代理之间的播放和(2)根据个人代理人的贡献正确分配奖励的能力。相反,在这种情况下,大多数工作仅考虑上述能力之一。在这项研究中,我们开发了一个架构,该体系结构允许代理之间进行交流,并为每个个人代理人量身定制系统的奖励。我们的体系结构代表通过图形卷积的代理通信,并应用了现有的信用分配结构,即反事实多代理策略梯度(COMA),以帮助代理商通过反向传播学习通信。图形结构的灵活性使我们的方法适用于多种多机构系统,例如动态系统由不同数量的代理和静态系统组成,具有固定数量的代理。我们在一系列任务上评估了我们的方法,证明了将沟通与信用分配结合的优势。在实验中,我们提出的方法比包括昏迷在内的最先进方法产生的性能更好。此外,我们表明,交流策略为我们提供了系统合作政策的见解和解释性。
We consider a fully cooperative multi-agent system where agents cooperate to maximize a system's utility in a partial-observable environment. We propose that multi-agent systems must have the ability to (1) communicate and understand the inter-plays between agents and (2) correctly distribute rewards based on an individual agent's contribution. In contrast, most work in this setting considers only one of the above abilities. In this study, we develop an architecture that allows for communication among agents and tailors the system's reward for each individual agent. Our architecture represents agent communication through graph convolution and applies an existing credit assignment structure, counterfactual multi-agent policy gradient (COMA), to assist agents to learn communication by back-propagation. The flexibility of the graph structure enables our method to be applicable to a variety of multi-agent systems, e.g. dynamic systems that consist of varying numbers of agents and static systems with a fixed number of agents. We evaluate our method on a range of tasks, demonstrating the advantage of marrying communication with credit assignment. In the experiments, our proposed method yields better performance than the state-of-art methods, including COMA. Moreover, we show that the communication strategies offers us insights and interpretability of the system's cooperative policies.