论文标题
使用技能细分从演示中学习选项
Learning Options from Demonstration using Skill Segmentation
论文作者
论文摘要
我们提出了一种从分段演示轨迹中学习选项的方法。首先,使用非参数贝叶斯聚类将轨迹分割为技能,然后使用逆增强学习来学习每个细分市场的奖励功能。由此,生成了一组推断的演示轨迹。选项启动集和终止条件是使用单级支持向量机群集算法从这些轨迹中学到的。我们在四个房间域中演示了我们的方法,在该域中,代理可以自主从人类示范中发现可用的选项。我们的结果表明,这些推断的选项可用于改善学习和计划。
We present a method for learning options from segmented demonstration trajectories. The trajectories are first segmented into skills using nonparametric Bayesian clustering and a reward function for each segment is then learned using inverse reinforcement learning. From this, a set of inferred trajectories for the demonstration are generated. Option initiation sets and termination conditions are learned from these trajectories using the one-class support vector machine clustering algorithm. We demonstrate our method in the four rooms domain, where an agent is able to autonomously discover usable options from human demonstration. Our results show that these inferred options can then be used to improve learning and planning.