信息理论模型预测Q学习

论文标题

信息理论模型预测Q学习

Information Theoretic Model Predictive Q-Learning

论文作者

Bhardwaj, Mohak, Handa, Ankur, Fox, Dieter, Boots, Byron

论文摘要

当可以廉价地收集经验时，无模型的增强学习（RL）效果很好，而基于模型的RL可以准确地建模系统动态时有效。但是，这两个假设在现实世界中都可以违反，例如机器人技术，其中查询系统可能很昂贵，而现实世界中的动态可能很难建模。与RL相反，模型预测控制（MPC）算法使用模拟器在线优化一个简单的策略类，构建一个可以有效地与现实世界动态抗衡的闭环控制器。 MPC性能通常受模型偏差和优化范围有限的因素的限制。在这项工作中，我们介绍了信息理论MPC和熵正规化RL之间的一种新颖的理论联系，并开发了一种可以利用有偏见模型的Q学习算法。我们验证了SIM到SIM控制任务上提出的算法，以证明从头开始的最佳控制和增强学习的改进。我们的方法为以系统的方式在实际系统上部署强化学习算法铺平了道路。

Model-free Reinforcement Learning (RL) works well when experience can be collected cheaply and model-based RL is effective when system dynamics can be modeled accurately. However, both assumptions can be violated in real world problems such as robotics, where querying the system can be expensive and real-world dynamics can be difficult to model. In contrast to RL, Model Predictive Control (MPC) algorithms use a simulator to optimize a simple policy class online, constructing a closed-loop controller that can effectively contend with real-world dynamics. MPC performance is usually limited by factors such as model bias and the limited horizon of optimization. In this work, we present a novel theoretical connection between information theoretic MPC and entropy regularized RL and develop a Q-learning algorithm that can leverage biased models. We validate the proposed algorithm on sim-to-sim control tasks to demonstrate the improvements over optimal control and reinforcement learning from scratch. Our approach paves the way for deploying reinforcement learning algorithms on real systems in a systematic manner.

下载PDF全文

下载文献需遵守相关版权规定

论文标题