以轨迹为中心的强化学习的本地政策优化

论文标题

以轨迹为中心的强化学习的本地政策优化

Local Policy Optimization for Trajectory-Centric Reinforcement Learning

论文作者

Kolaric, Patrik, Jha, Devesh K., Raghunathan, Arvind U., Lewis, Frank L., Benosman, Mouhacine, Romeres, Diego, Nikovski, Daniel

论文摘要

本文的目的是提出一种同时轨迹和本地稳定策略优化的方法，以生成基于轨迹的基于轨迹模型的增强学习（MBRL）的本地政策。这是由于以下事实，即非线性系统的全球政策优化在算法上和数值上可能是一个非常具有挑战性的问题。但是，许多机器人操纵任务以轨迹为中心，因此不需要全球模型或政策。由于学习模型估算中的不准确性，开环轨迹优化过程大多会导致在实际系统上使用时的性能非常差。在这些问题的推动下，我们试图将轨迹优化的问题和本地政策合成问题作为单个优化问题。然后将其作为非线性编程的实例同时解决。我们为分析提供了一些结果，并在一些简化的假设下实现了所提出的技术的性能。

The goal of this paper is to present a method for simultaneous trajectory and local stabilizing policy optimization to generate local policies for trajectory-centric model-based reinforcement learning (MBRL). This is motivated by the fact that global policy optimization for non-linear systems could be a very challenging problem both algorithmically and numerically. However, a lot of robotic manipulation tasks are trajectory-centric, and thus do not require a global model or policy. Due to inaccuracies in the learned model estimates, an open-loop trajectory optimization process mostly results in very poor performance when used on the real system. Motivated by these problems, we try to formulate the problem of trajectory optimization and local policy synthesis as a single optimization problem. It is then solved simultaneously as an instance of nonlinear programming. We provide some results for analysis as well as achieved performance of the proposed technique under some simplifying assumptions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题