对抗的自我竞争，以采用社会惯例

论文标题

对抗的自我竞争，以采用社会惯例

Adversarially Guided Self-Play for Adopting Social Conventions

论文作者

Tucker, Mycal, Zhou, Yilun, Shah, Julie

论文摘要

机器人代理必须采用现有的社会惯例才能成为有效的队友。这些社会惯例，例如在道路的右侧或左侧开车，是最佳政策中的任意选择，但是成功团队中的所有代理商都必须使用相同的惯例。先前的工作已经确定了一种将自我玩法与从现有代理收集的配对输入输出数据相结合的方法，以便在不与之互动的情况下学习他们的社交惯例。我们通过引入一种称为“对抗自我游戏（ASP）的技术，该技术使用对抗性训练来塑造可能的学术策略的空间并大大提高学习效率。 ASP仅需要添加未配对的数据：社交大会生成的输出数据集而没有相关的输入。理论分析揭示了ASP如何塑造政策空间和环境（行为被聚集或表现出某些其他结构），在这些情况下它提供了最大的好处。跨三个领域的经验结果证实了ASP的优势：它产生的模型在少于两个配对数据点的情况下更为匹配所需的社交惯例。

Robotic agents must adopt existing social conventions in order to be effective teammates. These social conventions, such as driving on the right or left side of the road, are arbitrary choices among optimal policies, but all agents on a successful team must use the same convention. Prior work has identified a method of combining self-play with paired input-output data gathered from existing agents in order to learn their social convention without interacting with them. We build upon this work by introducing a technique called Adversarial Self-Play (ASP) that uses adversarial training to shape the space of possible learned policies and substantially improves learning efficiency. ASP only requires the addition of unpaired data: a dataset of outputs produced by the social convention without associated inputs. Theoretical analysis reveals how ASP shapes the policy space and the circumstances (when behaviors are clustered or exhibit some other structure) under which it offers the greatest benefits. Empirical results across three domains confirm ASP's advantages: it produces models that more closely match the desired social convention when given as few as two paired datapoints.

下载PDF全文

下载文献需遵守相关版权规定

论文标题