论文标题
Gosafeopt:可扩展的安全探索,用于全球优化动态系统
GoSafeOpt: Scalable Safe Exploration for Global Optimization of Dynamical Systems
论文作者
论文摘要
直接在物理系统上学习最佳控制策略是具有挑战性的,因为即使是单个失败也会导致昂贵的硬件损坏。探索期间,大多数可确保安全性的无模型学习方法,即没有故障,仅限于本地Optima。一个值得注意的例外是GoSafe算法,不幸的是,该算法无法处理高维系统,因此无法应用于大多数真实的动态系统。这项工作建议Gosafeopt作为第一个可以安全地发现高维系统的全球最佳政策的算法,同时提供安全性和最佳保证。我们证明了Gosafeopt在机器人组上对GoSafe的竞争性安全性学习方法的优越性。
Learning optimal control policies directly on physical systems is challenging since even a single failure can lead to costly hardware damage. Most existing model-free learning methods that guarantee safety, i.e., no failures, during exploration are limited to local optima. A notable exception is the GoSafe algorithm, which, unfortunately, cannot handle high-dimensional systems and hence cannot be applied to most real-world dynamical systems. This work proposes GoSafeOpt as the first algorithm that can safely discover globally optimal policies for high-dimensional systems while giving safety and optimality guarantees. We demonstrate the superiority of GoSafeOpt over competing model-free safe learning methods on a robot arm that would be prohibitive for GoSafe.