Aspire：加强学习的自适应技能先验

论文标题

Aspire：加强学习的自适应技能先验

ASPiRe:Adaptive Skill Priors for Reinforcement Learning

论文作者

Xu, Mengda, Veloso, Manuela, Song, Shuran

论文摘要

我们介绍Aspire（RL的自适应技能），这是一种利用先前经验来加速增强学习的新方法。与从大型多样的数据集中学习一项技能的现有方法不同，我们的框架从专门数据集的集合中学习了一个不同的区别技能先验的库（即行为先验），并学习如何结合它们以解决新任务。该公式使算法可以获取一组专业的技能先验，这些技能先验更可用于下游任务；但是，这也提出了如何有效地结合这些非结构化的技能先验集以形成新任务的新事务的其他挑战。具体而言，它不仅要求代理商确定要使用的技能，还要求如何（顺序或同时）组合它们以形成新的先验。为了实现这一目标，Aspire包括自适应重量模块（AWM），该模块学会了推断不同技能先验之间的自适应重量分配，并使用它们通过加权的Kullback-Leibler Diverences来指导下游任务的政策学习。我们的实验表明，Aspire可以在有多个先验的情况下显着加速新的下游任务，并在竞争基线上显示出改进。

We introduce ASPiRe (Adaptive Skill Prior for RL), a new approach that leverages prior experience to accelerate reinforcement learning. Unlike existing methods that learn a single skill prior from a large and diverse dataset, our framework learns a library of different distinction skill priors (i.e., behavior priors) from a collection of specialized datasets, and learns how to combine them to solve a new task. This formulation allows the algorithm to acquire a set of specialized skill priors that are more reusable for downstream tasks; however, it also brings up additional challenges of how to effectively combine these unstructured sets of skill priors to form a new prior for new tasks. Specifically, it requires the agent not only to identify which skill prior(s) to use but also how to combine them (either sequentially or concurrently) to form a new prior. To achieve this goal, ASPiRe includes Adaptive Weight Module (AWM) that learns to infer an adaptive weight assignment between different skill priors and uses them to guide policy learning for downstream tasks via weighted Kullback-Leibler divergences. Our experiments demonstrate that ASPiRe can significantly accelerate the learning of new downstream tasks in the presence of multiple priors and show improvement on competitive baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题