通过基于人群的培训来降低可剥削性

论文标题

通过基于人群的培训来降低可剥削性

Reducing Exploitability with Population Based Training

论文作者

Czempin, Pavel, Gleave, Adam

论文摘要

在各种零和零游戏中，自我播放的增强学习实现了最先进的，通常是超人的表现。然而，先前的工作发现，反对普通对手有能力的政策可能会灾难性地反对对抗性政策：一个对受害者明确训练的对手。使用对抗训练的先前防御能够使受害者对特定的对手进行强大的态度，但受害者仍然容易受到新的对手。我们猜想这种限制是由于训练过程中看到的对手多样性不足。我们分析了使用基于人群的训练的防御，以使受害者对抗各种各样的对手。我们在两个低维环境中评估了这种防御对新对手的鲁棒性。通过攻击者训练时间步长以利用受害者的数量来衡量，这种辩护增加了对对手的鲁棒性。此外，我们表明鲁棒性与对手人群的大小相关。

Self-play reinforcement learning has achieved state-of-the-art, and often superhuman, performance in a variety of zero-sum games. Yet prior work has found that policies that are highly capable against regular opponents can fail catastrophically against adversarial policies: an opponent trained explicitly against the victim. Prior defenses using adversarial training were able to make the victim robust to a specific adversary, but the victim remained vulnerable to new ones. We conjecture this limitation was due to insufficient diversity of adversaries seen during training. We analyze a defense using population based training to pit the victim against a diverse set of opponents. We evaluate this defense's robustness against new adversaries in two low-dimensional environments. This defense increases robustness against adversaries, as measured by the number of attacker training timesteps to exploit the victim. Furthermore, we show that robustness is correlated with the size of the opponent population.

下载PDF全文

下载文献需遵守相关版权规定

论文标题