论文标题
关于动态输出反馈线性二次控制的优化格局
On the Optimization Landscape of Dynamic Output Feedback Linear Quadratic Control
论文作者
论文摘要
策略梯度算法的收敛性取决于基础最佳控制问题的优化格局。通常可以通过分析线性二次控制的理论见解来获取这些算法。但是,大多数现有文献仅考虑静态全州或输出反馈策略(控制器)的优化格局。我们研究了线性二次调节(缩写为DLQR)的动态输出反馈策略的更具挑战性的案例,该策略在实践中很普遍,但具有相当复杂的优化景观。我们首先显示DLQR成本如何随动态控制器的坐标转换而变化,然后为给定可观察的稳定控制器得出最佳转换。我们的核心结果之一是可观察到DLQR的固定点的唯一性,它为使用策略梯度方法求解动态控制器提供了最佳证书。此外,我们建立了DLQR和线性二次高斯控制等效的条件,从而提供了确定性和随机线性系统最佳控制的统一观点。这些结果进一步阐明了设计政策梯度算法,以通过部分观察到的信息进行更一般的决策问题。
The convergence of policy gradient algorithms hinges on the optimization landscape of the underlying optimal control problem. Theoretical insights into these algorithms can often be acquired from analyzing those of linear quadratic control. However, most of the existing literature only considers the optimization landscape for static full-state or output feedback policies (controllers). We investigate the more challenging case of dynamic output-feedback policies for linear quadratic regulation (abbreviated as dLQR), which is prevalent in practice but has a rather complicated optimization landscape. We first show how the dLQR cost varies with the coordinate transformation of the dynamic controller and then derive the optimal transformation for a given observable stabilizing controller. One of our core results is the uniqueness of the stationary point of dLQR when it is observable, which provides an optimality certificate for solving dynamic controllers using policy gradient methods. Moreover, we establish conditions under which dLQR and linear quadratic Gaussian control are equivalent, thus providing a unified viewpoint of optimal control of both deterministic and stochastic linear systems. These results further shed light on designing policy gradient algorithms for more general decision-making problems with partially observed information.