论文标题
多培训器互动增强学习系统
Multi-trainer Interactive Reinforcement Learning System
论文作者
论文摘要
交互式增强学习可以有效地通过人类反馈来促进代理培训。但是,这种方法通常要求人类老师知道代理商应该采取的正确行动是什么。换句话说,如果人类老师并不总是可靠的,那么它将无法始终如一地指导代理商进行培训。在本文中,我们提出了一个更有效的交互式增强学习系统,通过介绍多个培训师,即多培训师交互式增强学习(MIMTIRL),这可以将来自多个非完美培训师的二进制反馈汇总成在奖励sparse环境中的代理商培训的更可靠的奖励。特别是,我们的教练反馈聚合实验表明,与多数投票,加权投票和贝叶斯方法相比,我们的聚合方法的准确性最高。最后,我们进行了一个网格世界实验,以表明由MIRTIRL培训的审查模型训练的政策比没有审查模型的政策更接近最佳策略。
Interactive reinforcement learning can effectively facilitate the agent training via human feedback. However, such methods often require the human teacher to know what is the correct action that the agent should take. In other words, if the human teacher is not always reliable, then it will not be consistently able to guide the agent through its training. In this paper, we propose a more effective interactive reinforcement learning system by introducing multiple trainers, namely Multi-Trainer Interactive Reinforcement Learning (MTIRL), which could aggregate the binary feedback from multiple non-perfect trainers into a more reliable reward for an agent training in a reward-sparse environment. In particular, our trainer feedback aggregation experiments show that our aggregation method has the best accuracy when compared with the majority voting, the weighted voting, and the Bayesian method. Finally, we conduct a grid-world experiment to show that the policy trained by the MTIRL with the review model is closer to the optimal policy than that without a review model.