多代理网络系统中的可扩展和样本有效的分布式策略梯度算法

论文标题

多代理网络系统中的可扩展和样本有效的分布式策略梯度算法

Scalable and Sample Efficient Distributed Policy Gradient Algorithms in Multi-Agent Networked Systems

论文作者

Liu, Xin, Wei, Honghao, Ying, Lei

论文摘要

本文研究了一类多代理增强学习（MARL）问题，其中代理人获得的奖励取决于其他代理的状态，但是下一个状态仅取决于代理人的当前状态和行动。我们将其命名为rec-marl代表奖励耦合的多代理增强学习。 REC-MARL具有一系列重要的应用程序，例如无线网络中的实时访问控制和分布式电源控制。本文介绍了REC-MARL的分布式策略梯度算法。所提出的算法分布在两个方面：（i）学习的策略是一项分布式政策，将代理的本地状态映射到其本地行动，并且（ii）分发学习/培训，在此期间，每个代理商在此期间根据其自己和邻居的信息更新其策略。学到的算法实现了固定政策，其迭代复杂性界限取决于当地国家和行动的维度。我们对无线网络中实时访问控制和电源控制的算法的实验结果表明，我们的策略大大优于最先进的算法和众所周知的基准。

This paper studies a class of multi-agent reinforcement learning (MARL) problems where the reward that an agent receives depends on the states of other agents, but the next state only depends on the agent's own current state and action. We name it REC-MARL standing for REward-Coupled Multi-Agent Reinforcement Learning. REC-MARL has a range of important applications such as real-time access control and distributed power control in wireless networks. This paper presents a distributed policy gradient algorithm for REC-MARL. The proposed algorithm is distributed in two aspects: (i) the learned policy is a distributed policy that maps a local state of an agent to its local action and (ii) the learning/training is distributed, during which each agent updates its policy based on its own and neighbors' information. The learned algorithm achieves a stationary policy and its iterative complexity bounds depend on the dimension of local states and actions. The experimental results of our algorithm for the real-time access control and power control in wireless networks show that our policy significantly outperforms the state-of-the-art algorithms and well-known benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题