论文标题
在加固学习中进行零射传输的超净核
Hypernetworks for Zero-shot Transfer in Reinforcement Learning
论文作者
论文摘要
在本文中,通过新的基于TD的训练目标和一组近距离的RL解决方案来实现培训任务的新型训练目标,从而在一系列未见的任务条件下培训了超级核武器。这项工作涉及元RL,上下文RL和转移学习,特别关注测试时间的零拍摄性能,这是通过了解任务参数(也称为上下文)来实现的。我们的技术方法是基于将每个RL算法视为从MDP细节到近乎最佳的价值函数和策略的映射,并试图用一个可以生成近乎最佳价值的功能和策略的超网络近似于它,鉴于MDP的参数。我们表明,在某些条件下,该映射可以被视为监督学习问题。我们从经验上评估了我们方法对来自DeepMind Control Suite的一系列连续控制任务的零射击转移到新的奖励和过渡动态的有效性。我们的方法表明,从多任务和元RL方法中的基准进行了显着改善。
In this paper, hypernetworks are trained to generate behaviors across a range of unseen task conditions, via a novel TD-based training objective and data from a set of near-optimal RL solutions for training tasks. This work relates to meta RL, contextual RL, and transfer learning, with a particular focus on zero-shot performance at test time, enabled by knowledge of the task parameters (also known as context). Our technical approach is based upon viewing each RL algorithm as a mapping from the MDP specifics to the near-optimal value function and policy and seek to approximate it with a hypernetwork that can generate near-optimal value functions and policies, given the parameters of the MDP. We show that, under certain conditions, this mapping can be considered as a supervised learning problem. We empirically evaluate the effectiveness of our method for zero-shot transfer to new reward and transition dynamics on a series of continuous control tasks from DeepMind Control Suite. Our method demonstrates significant improvements over baselines from multitask and meta RL approaches.