论文标题
E-MAPP:通过并行计划指导有效的多机构增强学习
E-MAPP: Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance
论文作者
论文摘要
多代理增强学习(MARL)中的一个关键挑战是多种代理人有效地完成复杂的长途任务。代理商通常很难在共同的目标,划分复杂的任务以及在几个阶段进行计划以取得进展。我们建议通过指导代理商制定用于并行化的程序来应对这些挑战,因为作为代表的程序包含丰富的结构和语义信息,并且被广泛用作长途任务的抽象。具体来说,我们使用并行程序指导(E-MAPP)介绍了有效的多代理增强学习,该指导(E-MAPP)是一个新颖的框架,利用并行程序来指导多个代理,以有效地实现需要计划超过$ 10+$阶段的目标。 E-MAPP整合了并行程序中的结构信息,促进了基于程序语义的合作行为,并通过任务分配器提高了时间效率。我们对过度煮熟的环境中的一系列具有挑战性的长途合作任务进行了广泛的实验。结果表明,E-MAPP在完成率,时间效率和零击概括能力方面优于强大基准。
A critical challenge in multi-agent reinforcement learning(MARL) is for multiple agents to efficiently accomplish complex, long-horizon tasks. The agents often have difficulties in cooperating on common goals, dividing complex tasks, and planning through several stages to make progress. We propose to address these challenges by guiding agents with programs designed for parallelization, since programs as a representation contain rich structural and semantic information, and are widely used as abstractions for long-horizon tasks. Specifically, we introduce Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance(E-MAPP), a novel framework that leverages parallel programs to guide multiple agents to efficiently accomplish goals that require planning over $10+$ stages. E-MAPP integrates the structural information from a parallel program, promotes the cooperative behaviors grounded in program semantics, and improves the time efficiency via a task allocator. We conduct extensive experiments on a series of challenging, long-horizon cooperative tasks in the Overcooked environment. Results show that E-MAPP outperforms strong baselines in terms of the completion rate, time efficiency, and zero-shot generalization ability by a large margin.