Gosafeopt：可扩展的安全探索，用于全球优化动态系统

论文标题

Gosafeopt：可扩展的安全探索，用于全球优化动态系统

GoSafeOpt: Scalable Safe Exploration for Global Optimization of Dynamical Systems

论文作者

Sukhija, Bhavya, Turchetta, Matteo, Lindner, David, Krause, Andreas, Trimpe, Sebastian, Baumann, Dominik

论文摘要

直接在物理系统上学习最佳控制策略是具有挑战性的，因为即使是单个失败也会导致昂贵的硬件损坏。探索期间，大多数可确保安全性的无模型学习方法，即没有故障，仅限于本地Optima。一个值得注意的例外是GoSafe算法，不幸的是，该算法无法处理高维系统，因此无法应用于大多数真实的动态系统。这项工作建议Gosafeopt作为第一个可以安全地发现高维系统的全球最佳政策的算法，同时提供安全性和最佳保证。我们证明了Gosafeopt在机器人组上对GoSafe的竞争性安全性学习方法的优越性。

Learning optimal control policies directly on physical systems is challenging since even a single failure can lead to costly hardware damage. Most existing model-free learning methods that guarantee safety, i.e., no failures, during exploration are limited to local optima. A notable exception is the GoSafe algorithm, which, unfortunately, cannot handle high-dimensional systems and hence cannot be applied to most real-world dynamical systems. This work proposes GoSafeOpt as the first algorithm that can safely discover globally optimal policies for high-dimensional systems while giving safety and optimality guarantees. We demonstrate the superiority of GoSafeOpt over competing model-free safe learning methods on a robot arm that would be prohibitive for GoSafe.

下载PDF全文

下载文献需遵守相关版权规定

论文标题