使用对抗人群的强大的加固学习

论文标题

使用对抗人群的强大的加固学习

Robust Reinforcement Learning using Adversarial Populations

论文作者

Vinitsky, Eugene, Du, Yuqing, Parvate, Kanaad, Jang, Kathy, Abbeel, Pieter, Bayen, Alexandre

论文摘要

增强学习（RL）是控制器设计的有效工具，但可能会在稳健性问题上挣扎，当底层系统动态受到干扰时，灾难性失败。强大的RL配方通过在动态中添加最坏情况的对抗噪声并将噪声分布作为零和最小游戏的解决方案来解决这一问题。但是，现有在学习解决方案的RL配方方面的工作主要集中于对单个对手进行单个RL代理的培训。在这项工作中，我们证明，在对手的标准参数化下，使用单个对手不会对动态变化产生稳健性。最终的政策是由新的对手高度利用的。我们提出了一种基于人群的增强，以增强RL的强大配方，在该公式中，我们在训练过程中随机初始初始初始初始初始初始初始初始初始初始初米亚配米亚配米亚配配。我们从机器人基准的经验上验证了对抗人群的使用会导致更强大的政策，从而改善了分布泛滥的概括。最后，我们证明了这种方法在这些基准测试中提供了可比的鲁棒性和泛化，同时避免了无处不在的域随机破坏模式。

Reinforcement Learning (RL) is an effective tool for controller design but can struggle with issues of robustness, failing catastrophically when the underlying system dynamics are perturbed. The Robust RL formulation tackles this by adding worst-case adversarial noise to the dynamics and constructing the noise distribution as the solution to a zero-sum minimax game. However, existing work on learning solutions to the Robust RL formulation has primarily focused on training a single RL agent against a single adversary. In this work, we demonstrate that using a single adversary does not consistently yield robustness to dynamics variations under standard parametrizations of the adversary; the resulting policy is highly exploitable by new adversaries. We propose a population-based augmentation to the Robust RL formulation in which we randomly initialize a population of adversaries and sample from the population uniformly during training. We empirically validate across robotics benchmarks that the use of an adversarial population results in a more robust policy that also improves out-of-distribution generalization. Finally, we demonstrate that this approach provides comparable robustness and generalization as domain randomization on these benchmarks while avoiding a ubiquitous domain randomization failure mode.

下载PDF全文

下载文献需遵守相关版权规定

论文标题