论文标题
复杂部分观察的区分性粒子滤波器增强学习
Discriminative Particle Filter Reinforcement Learning for Complex Partial Observations
论文作者
论文摘要
深度强化学习在对复杂游戏(例如Atari,Go等)的决策中取得了成功。但是,现实世界中的决策通常需要通过从复杂的视觉观察中提取的部分信息进行推理。本文介绍了歧视性粒子过滤器增强学习(DPFRL),这是一个新的增强式学习框架,用于复杂的部分观察。 DPFRL在神经网络策略中编码一个可区分的粒子滤波器,以随着时间的流逝而进行部分观察。粒子过滤器使用学到的判别更新保持信念,该更新经过训练以进行决策。我们表明,使用判别性更新而不是标准生成模型会大大提高性能,尤其是对于具有复杂视觉观察的任务,因为它们避免了对与决策无关的复杂观测值进行建模的困难。此外,为了从粒子信念中提取特征,我们根据瞬间生成函数提出了一种新型的信念特征。 DPFRL在闪烁的Atari游戏,现有的POMDP RL基准和自然闪烁的Atari Games(一种新的,更具挑战性的POMDP RL基准标准)中,在闪烁的Atari Games中胜过最先进的POMDP RL模型。此外,DPFRL在栖息地环境中使用现实世界数据的视觉导航表现良好。
Deep reinforcement learning is successful in decision making for sophisticated games, such as Atari, Go, etc. However, real-world decision making often requires reasoning with partial information extracted from complex visual observations. This paper presents Discriminative Particle Filter Reinforcement Learning (DPFRL), a new reinforcement learning framework for complex partial observations. DPFRL encodes a differentiable particle filter in the neural network policy for explicit reasoning with partial observations over time. The particle filter maintains a belief using learned discriminative update, which is trained end-to-end for decision making. We show that using the discriminative update instead of standard generative models results in significantly improved performance, especially for tasks with complex visual observations, because they circumvent the difficulty of modeling complex observations that are irrelevant to decision making. In addition, to extract features from the particle belief, we propose a new type of belief feature based on the moment generating function. DPFRL outperforms state-of-the-art POMDP RL models in Flickering Atari Games, an existing POMDP RL benchmark, and in Natural Flickering Atari Games, a new, more challenging POMDP RL benchmark introduced in this paper. Further, DPFRL performs well for visual navigation with real-world data in the Habitat environment.