在基于模型的RL中进行不确定性的计划模型学习和误差校正

论文标题

在基于模型的RL中进行不确定性的计划模型学习和误差校正

Bootstrapped model learning and error correction for planning with uncertainty in model-based RL

论文作者

Ovalle, Alvaro, Lucas, Simon M.

论文摘要

可以访问前向模型，可以使用计划算法，例如蒙特卡洛树搜索和滚动地平线的演变。如果模型不可用，一个自然的目的是学习一个准确反映环境动态的模型。在许多情况下，这可能是不可能的，模型中的最小故障可能导致性能和失败差。本文通过不确定性吸引强化学习剂探讨了模型错误指定的问题。我们提出了一个自举的多头神经网络，以了解未来状态和奖励的分布。我们尝试许多方案来提取最可能的预测。此外，我们还引入了一个全局误差校正过滤器，该过滤器应用于通过预测分布提供的上下文引导的高级约束。我们说明了我们在Minipacman的方法。评估表明，在处理不完善的模型时，我们的方法在模型的准确性和在计划算法中的使用方面都表现出更高的性能和稳定性。

Having access to a forward model enables the use of planning algorithms such as Monte Carlo Tree Search and Rolling Horizon Evolution. Where a model is unavailable, a natural aim is to learn a model that reflects accurately the dynamics of the environment. In many situations it might not be possible and minimal glitches in the model may lead to poor performance and failure. This paper explores the problem of model misspecification through uncertainty-aware reinforcement learning agents. We propose a bootstrapped multi-headed neural network that learns the distribution of future states and rewards. We experiment with a number of schemes to extract the most likely predictions. Moreover, we also introduce a global error correction filter that applies high-level constraints guided by the context provided through the predictive distribution. We illustrate our approach on Minipacman. The evaluation demonstrates that when dealing with imperfect models, our methods exhibit increased performance and stability, both in terms of model accuracy and in its use within a planning algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题