论文标题
通过反对分配加强学习来改善鲁棒性
Improving Robustness via Risk Averse Distributional Reinforcement Learning
论文作者
论文摘要
排除在现实世界应用中增强学习成功的一个主要障碍是缺乏对训练有素的政策建模不确定性或外部干扰的鲁棒性。当在模拟而不是现实世界环境中训练政策时,鲁棒性至关重要。在这项工作中,我们提出了一种风险感知算法,以学习强大的政策,以弥合模拟培训和现实世界实施之间的差距。我们的算法基于最近发现的分布RL框架。我们将CVAR风险度量纳入基于样本的分布策略梯度(SDPG)中,以实现规避风险的策略,以实现与一系列系统干扰的鲁棒性。我们验证了多种环境上风险感知SDPG的鲁棒性。
One major obstacle that precludes the success of reinforcement learning in real-world applications is the lack of robustness, either to model uncertainties or external disturbances, of the trained policies. Robustness is critical when the policies are trained in simulations instead of real world environment. In this work, we propose a risk-aware algorithm to learn robust policies in order to bridge the gap between simulation training and real-world implementation. Our algorithm is based on recently discovered distributional RL framework. We incorporate CVaR risk measure in sample based distributional policy gradients (SDPG) for learning risk-averse policies to achieve robustness against a range of system disturbances. We validate the robustness of risk-aware SDPG on multiple environments.