论文标题
在基于学习的手持操作中自适应手指合作的多机构方法
A Multi-Agent Approach for Adaptive Finger Cooperation in Learning-based In-Hand Manipulation
论文作者
论文摘要
由于其高度的自由度和与物体的复杂相互作用,对多指机器人手的手持操作对于多指机器人手来说是具有挑战性的。为了实现手头操纵,现有的基于深入学习的方法主要集中于通过集中学习机制培训单个机器人结构特定的政策,从而缺乏对机器人故障等变化的适应性。为了解决这一限制,这项工作将每只手指视为单独的代理,并训练多个代理以控制他们指定的手指以合作地完成操纵任务。我们提出了多代理的全球观察评论家和地方观察参与者(MAGCLA)方法,批评家可以在全球范围内观察所有代理商的行动,而演员只能在本地观察其邻居的行为。此外,由于每个代理的异步性能增加,传统的个人经验重播可能会导致不稳定的合作,这对于手持操作任务至关重要。为了解决这个问题,我们提出了同步的事后经验重播(SHER)方法,以同步并有效地重用所有代理的重播体验。这些方法在阴影灵巧的手上的两个内部操纵任务中进行评估。结果表明,Sher帮助Magcla实现了与单个策略的可比学习效率,而MAGCLA方法在不同的任务中更为普遍。与基线多代理和单一代理方法相比,训练有素的政策在机器人故障测试中具有更高的适应性。
In-hand manipulation is challenging for a multi-finger robotic hand due to its high degrees of freedom and the complex interaction with the object. To enable in-hand manipulation, existing deep reinforcement learning based approaches mainly focus on training a single robot-structure-specific policy through the centralized learning mechanism, lacking adaptability to changes like robot malfunction. To solve this limitation, this work treats each finger as an individual agent and trains multiple agents to control their assigned fingers to complete the in-hand manipulation task cooperatively. We propose the Multi-Agent Global-Observation Critic and Local-Observation Actor (MAGCLA) method, where the critic can observe all agents' actions globally, and the actor only locally observes its neighbors' actions. Besides, conventional individual experience replay may cause unstable cooperation due to the asynchronous performance increment of each agent, which is critical for in-hand manipulation tasks. To solve this issue, we propose the Synchronized Hindsight Experience Replay (SHER) method to synchronize and efficiently reuse the replayed experience across all agents. The methods are evaluated in two in-hand manipulation tasks on the Shadow dexterous hand. The results show that SHER helps MAGCLA achieve comparable learning efficiency to a single policy, and the MAGCLA approach is more generalizable in different tasks. The trained policies have higher adaptability in the robot malfunction test compared to the baseline multi-agent and single-agent approaches.