论文标题

与神经密度模型的模仿

Imitation with Neural Density Models

论文作者

Kim, Kuno, Jindal, Akshat, Song, Yang, Song, Jiaming, Sui, Yanan, Ermon, Stefano

论文摘要

我们通过对专家的占用度量的密度估计,提出了一个新的模仿学习框架(IL),然后使用密度作为奖励进行最大的入住熵增强学习(RL)。我们的方法最大化了一个非对抗性模型的RL目标,该目标可证明是在专家和模仿者的占用度量之间降低界限的反向kullback-leibler差异。我们提出了一种实用的IL算法,即神经密度模仿(NDI),该算法在基准控制任务上获得了最先进的演示效率。

We propose a new framework for Imitation Learning (IL) via density estimation of the expert's occupancy measure followed by Maximum Occupancy Entropy Reinforcement Learning (RL) using the density as a reward. Our approach maximizes a non-adversarial model-free RL objective that provably lower bounds reverse Kullback-Leibler divergence between occupancy measures of the expert and imitator. We present a practical IL algorithm, Neural Density Imitation (NDI), which obtains state-of-the-art demonstration efficiency on benchmark control tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源