多代理强化学习中对抗性交流的出现

论文标题

多代理强化学习中对抗性交流的出现

The Emergence of Adversarial Communication in Multi-Agent Reinforcement Learning

论文作者

Blumenkamp, Jan, Prorok, Amanda

论文摘要

许多现实世界中的问题需要多种自治药物的协调。最近的工作表明了图形神经网络（GNN）的希望学习明确的沟通策略，以实现复杂的多机构协调。这些作品使用合作多代理系统的模型，使代理商努力实现共同的全球目标。在考虑具有自我利益本地目标的代理商时，标准设计选择是将它们建模为单独的学习系统（尽管共享相同的环境）。但是，这样的设计选择排除了一个单一的，可区分的通信渠道的存在，因此禁止学习代理间交流策略。在这项工作中，我们通过提出一个学习模型来解决这个差距，该模型可容纳单个非共享奖励和在所有代理中常见的可区分通信渠道。我们专注于代理具有自我利益目标的情况，并开发出一种学习算法，从而引起对抗性交流的出现。我们对多代理覆盖范围和路径计划问题进行实验，并采用事后解释性技术来可视化代理相互通信的消息。我们展示了单个自私的代理如何能够学习高度操纵性的沟通策略，从而使其能够显着胜过合作的代理团队。

Many real-world problems require the coordination of multiple autonomous agents. Recent work has shown the promise of Graph Neural Networks (GNNs) to learn explicit communication strategies that enable complex multi-agent coordination. These works use models of cooperative multi-agent systems whereby agents strive to achieve a shared global goal. When considering agents with self-interested local objectives, the standard design choice is to model these as separate learning systems (albeit sharing the same environment). Such a design choice, however, precludes the existence of a single, differentiable communication channel, and consequently prohibits the learning of inter-agent communication strategies. In this work, we address this gap by presenting a learning model that accommodates individual non-shared rewards and a differentiable communication channel that is common among all agents. We focus on the case where agents have self-interested objectives, and develop a learning algorithm that elicits the emergence of adversarial communications. We perform experiments on multi-agent coverage and path planning problems, and employ a post-hoc interpretability technique to visualize the messages that agents communicate to each other. We show how a single self-interested agent is capable of learning highly manipulative communication strategies that allows it to significantly outperform a cooperative team of agents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题