论文标题
与神经密度模型的模仿
Imitation with Neural Density Models
论文作者
论文摘要
我们通过对专家的占用度量的密度估计,提出了一个新的模仿学习框架(IL),然后使用密度作为奖励进行最大的入住熵增强学习(RL)。我们的方法最大化了一个非对抗性模型的RL目标,该目标可证明是在专家和模仿者的占用度量之间降低界限的反向kullback-leibler差异。我们提出了一种实用的IL算法,即神经密度模仿(NDI),该算法在基准控制任务上获得了最先进的演示效率。
We propose a new framework for Imitation Learning (IL) via density estimation of the expert's occupancy measure followed by Maximum Occupancy Entropy Reinforcement Learning (RL) using the density as a reward. Our approach maximizes a non-adversarial model-free RL objective that provably lower bounds reverse Kullback-Leibler divergence between occupancy measures of the expert and imitator. We present a practical IL algorithm, Neural Density Imitation (NDI), which obtains state-of-the-art demonstration efficiency on benchmark control tasks.