与神经密度模型的模仿

论文标题

与神经密度模型的模仿

Imitation with Neural Density Models

论文作者

Kim, Kuno, Jindal, Akshat, Song, Yang, Song, Jiaming, Sui, Yanan, Ermon, Stefano

论文摘要

我们通过对专家的占用度量的密度估计，提出了一个新的模仿学习框架（IL），然后使用密度作为奖励进行最大的入住熵增强学习（RL）。我们的方法最大化了一个非对抗性模型的RL目标，该目标可证明是在专家和模仿者的占用度量之间降低界限的反向kullback-leibler差异。我们提出了一种实用的IL算法，即神经密度模仿（NDI），该算法在基准控制任务上获得了最先进的演示效率。

We propose a new framework for Imitation Learning (IL) via density estimation of the expert's occupancy measure followed by Maximum Occupancy Entropy Reinforcement Learning (RL) using the density as a reward. Our approach maximizes a non-adversarial model-free RL objective that provably lower bounds reverse Kullback-Leibler divergence between occupancy measures of the expert and imitator. We present a practical IL algorithm, Neural Density Imitation (NDI), which obtains state-of-the-art demonstration efficiency on benchmark control tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题