带有神经奖励功能的开放式增强学习

论文标题

带有神经奖励功能的开放式增强学习

Open-Ended Reinforcement Learning with Neural Reward Functions

论文作者

Meier, Robert, Mujika, Asier

论文摘要

受到计算机视觉和自然语言处理中无监督学习的巨大成功的启发，强化学习社区最近开始将更多地关注无监督的技能发现。大多数当前的方法，例如Diayn或dads，都优化了某种形式的相互信息目标。我们提出了一种不同的方法，该方法使用神经网络编码的奖励功能。这些经过迭代训练以奖励更复杂的行为。在高维机器人环境中，我们的方法学习了广泛的有趣技能，包括半cheetah的前盘和人形生物的一脚跑步。在基于像素的蒙特祖玛的复仇环境中，我们的方法还可以最小的变化，并且学习了涉及与物品互动和参观各种位置的复杂技能。可以在此链接中找到我们的方法的实现：https：//github.com/amujika/open-enden-dend-reinning-learning-with-neural-word-functions。

Inspired by the great success of unsupervised learning in Computer Vision and Natural Language Processing, the Reinforcement Learning community has recently started to focus more on unsupervised discovery of skills. Most current approaches, like DIAYN or DADS, optimize some form of mutual information objective. We propose a different approach that uses reward functions encoded by neural networks. These are trained iteratively to reward more complex behavior. In high-dimensional robotic environments our approach learns a wide range of interesting skills including front-flips for Half-Cheetah and one-legged running for Humanoid. In the pixel-based Montezuma's Revenge environment our method also works with minimal changes and it learns complex skills that involve interacting with items and visiting diverse locations. The implementation of our approach can be found in this link: https://github.com/amujika/Open-Ended-Reinforcement-Learning-with-Neural-Reward-Functions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题