CEIP：将强化学习与演示的明确和隐性先验结合在一起

论文标题

CEIP：将强化学习与演示的明确和隐性先验结合在一起

CEIP: Combining Explicit and Implicit Priors for Reinforcement Learning with Demonstrations

论文作者

Yan, Kai, Schwing, Alexander G., Wang, Yu-Xiong

论文摘要

尽管强化学习在密集的奖励环境中发现了广泛的使用，但培训具有稀疏奖励的自主剂仍然具有挑战性。为了解决这一困难，先前的工作不仅使用特定于任务的演示，而且还使用任务不合时宜的表现，尽管某种程度上相关演示了。在大多数情况下，可用的演示被蒸馏成隐性先验，通常通过单个深网表示。可以查询的数据库形式的显式先验也已显示出令人鼓舞的结果。为了更好地从可用的演示中受益，我们开发了一种结合显式和隐式先验的方法（CEIP）。 CEIP以并行形成单个复合物先验的形式来利用多个隐式先验。此外，CEIP使用有效的明确检索和推动机制来调节隐式先验。在三个具有挑战性的环境中，我们找到了提出的CEIP方法，可以改进先进的最新技术。

Although reinforcement learning has found widespread use in dense reward settings, training autonomous agents with sparse rewards remains challenging. To address this difficulty, prior work has shown promising results when using not only task-specific demonstrations but also task-agnostic albeit somewhat related demonstrations. In most cases, the available demonstrations are distilled into an implicit prior, commonly represented via a single deep net. Explicit priors in the form of a database that can be queried have also been shown to lead to encouraging results. To better benefit from available demonstrations, we develop a method to Combine Explicit and Implicit Priors (CEIP). CEIP exploits multiple implicit priors in the form of normalizing flows in parallel to form a single complex prior. Moreover, CEIP uses an effective explicit retrieval and push-forward mechanism to condition the implicit priors. In three challenging environments, we find the proposed CEIP method to improve upon sophisticated state-of-the-art techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题