在动态过滤器上进行全球收敛的策略搜索以进行输出估计

论文标题

在动态过滤器上进行全球收敛的策略搜索以进行输出估计

Globally Convergent Policy Search over Dynamic Filters for Output Estimation

论文作者

Umenberger, Jack, Simchowitz, Max, Perdomo, Juan C., Zhang, Kaiqing, Tedrake, Russ

论文摘要

我们介绍了第一个直接的策略搜索算法，该算法可证明，该算法可将其收敛到全球最佳的$ \ textit {dynamic} $过滤器，以预测给定噪声，部分观察的经典问题，即预测线性动力学系统的输出。尽管实践中部分可观察性无处不在，但现代强化学习的主持人之一的直接政策搜索算法的理论保证已被证明很难实现。这主要是由于对维持内部状态的过滤器进行优化时出现的脱落性。在本文中，我们基于$ \ textit {informativity} $的概念提供了有关这个具有挑战性问题的新观点，该概念直观地要求过滤器内部状态的所有组件都代表了基本动力学系统的真实状态。我们表明，信息性克服了上述堕落性。具体而言，我们提出了一个明确执行信息的$ \ textit {正常化程序} $，并确定该正规目标上的梯度下降 - 与``重新调节步骤''结合在一起，收敛到全球最佳成本a $ \ nathcal {o}（O}（O}（O}（1/T）$）。我们的分析依赖于可能引起独立感兴趣的几个新结果，包括一个新的框架，用于通过凸重新印象分析非凸梯度下降，以及根据（我们的（我们的定量衡量）信息）来解决线性Lyapunov方程的新界限。

We introduce the first direct policy search algorithm which provably converges to the globally optimal $\textit{dynamic}$ filter for the classical problem of predicting the outputs of a linear dynamical system, given noisy, partial observations. Despite the ubiquity of partial observability in practice, theoretical guarantees for direct policy search algorithms, one of the backbones of modern reinforcement learning, have proven difficult to achieve. This is primarily due to the degeneracies which arise when optimizing over filters that maintain internal state. In this paper, we provide a new perspective on this challenging problem based on the notion of $\textit{informativity}$, which intuitively requires that all components of a filter's internal state are representative of the true state of the underlying dynamical system. We show that informativity overcomes the aforementioned degeneracy. Specifically, we propose a $\textit{regularizer}$ which explicitly enforces informativity, and establish that gradient descent on this regularized objective - combined with a ``reconditioning step'' - converges to the globally optimal cost a $\mathcal{O}(1/T)$ rate. Our analysis relies on several new results which may be of independent interest, including a new framework for analyzing non-convex gradient descent via convex reformulation, and novel bounds on the solution to linear Lyapunov equations in terms of (our quantitative measure of) informativity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题