论文标题
制定用于多阶段任务的合作政策
Developing cooperative policies for multi-stage tasks
论文作者
论文摘要
本文提出了合作软性演员评论家(CSAC)方法,即使连续的强化学习者能够合作解决长期的视野多阶段任务。通过修改每个代理商的策略以最大化当前和下一个代理的评论家来实现此方法。合作最大化每个代理人的评论家允许每个代理人采取有益于其任务以及后续任务的行动。在多房间的迷宫域中使用此方法,合作策略能够超越不合作策略以及在整个域中训练的单一代理。 CSAC的成功率至少比不合作策略高20 \%,并且在溶液上收敛至少4倍的速度,比单一药物快4倍。
This paper proposes the Cooperative Soft Actor Critic (CSAC) method of enabling consecutive reinforcement learning agents to cooperatively solve a long time horizon multi-stage task. This method is achieved by modifying the policy of each agent to maximise both the current and next agent's critic. Cooperatively maximising each agent's critic allows each agent to take actions that are beneficial for its task as well as subsequent tasks. Using this method in a multi-room maze domain, the cooperative policies were able to outperform both uncooperative policies as well as a single agent trained across the entire domain. CSAC achieved a success rate of at least 20\% higher than the uncooperative policies, and converged on a solution at least 4 times faster than the single agent.