策略分解的搜索方法

论文标题

策略分解的搜索方法

Search Methods for Policy Decompositions

论文作者

Khadke, Ashwin, Geyer, Hartmut

论文摘要

计算复杂动力学系统的最佳控制策略需要近似方法才能保持计算上的处理。已经开发了几种近似方法来解决此问题。但是，由于这些近似值，这些方法并不理由在由此产生的控制策略中诱导的次级临时性。我们介绍了政策分解，这是一种在我们早期工作中提供次要估计值的近似方法。策略分解提出了将最佳控制问题分解为较低维度子问题的策略，其最佳解决方案被合并为为原始系统构建控制策略。但是，通过系统的复杂性快速分解系统量表的可能策略数量，带来了组合挑战。在这项工作中，我们研究了遗传算法和蒙特卡洛树搜索的使用来减轻这一挑战。我们确定了对4自由度操纵器的摇摆控制，简化的双子的平衡控制以及对四轮驱动器的悬停控制的分解。

Computing optimal control policies for complex dynamical systems requires approximation methods to remain computationally tractable. Several approximation methods have been developed to tackle this problem. However, these methods do not reason about the suboptimality induced in the resulting control policies due to these approximations. We introduced Policy Decomposition, an approximation method that provides a suboptimality estimate, in our earlier work. Policy decomposition proposes strategies to break an optimal control problem into lower-dimensional subproblems, whose optimal solutions are combined to build a control policy for the original system. However, the number of possible strategies to decompose a system scale quickly with the complexity of a system, posing a combinatorial challenge. In this work we investigate the use of Genetic Algorithm and Monte-Carlo Tree Search to alleviate this challenge. We identify decompositions for swing-up control of a 4 degree-of-freedom manipulator, balance control of a simplified biped, and hover control of a quadcopter.

下载PDF全文

下载文献需遵守相关版权规定

论文标题