论文标题
学习随机参数可区分的预测控制策略
Learning Stochastic Parametric Differentiable Predictive Control Policies
论文作者
论文摘要
合成随机明确模型预测控制策略的问题即使对于使用经典控制理论方法时,即使对于适度复杂性系统的系统也很快也很棘手。为了应对这一挑战,我们提出了一种可扩展的替代方案,称为随机参数可区分的预测控制(SP-DPC),用于无监督的学习神经控制策略,管辖随机线性系统受到非线性机会限制的限制。 SP-DPC作为与随机参数约束最佳控制问题的确定性近似。该公式使我们能够通过对问题的价值函数的自动差异来直接计算策略梯度,从而对采样参数和不确定性进行评估。特别是,通过已知的名义系统动力学模型和神经控制策略参数的闭环系统推出对SP-DPC问题的计算期望进行了反向传播,该策略允许直接基于模型的策略优化。我们为通过SP-DPC方法学到的有关闭环稳定性和机会限制满意度的策略提供了理论概率保证。此外,我们证明了三个数值示例中提出的策略优化算法的计算效率和可扩展性,包括具有大量状态或受非线性约束的系统。
The problem of synthesizing stochastic explicit model predictive control policies is known to be quickly intractable even for systems of modest complexity when using classical control-theoretic methods. To address this challenge, we present a scalable alternative called stochastic parametric differentiable predictive control (SP-DPC) for unsupervised learning of neural control policies governing stochastic linear systems subject to nonlinear chance constraints. SP-DPC is formulated as a deterministic approximation to the stochastic parametric constrained optimal control problem. This formulation allows us to directly compute the policy gradients via automatic differentiation of the problem's value function, evaluated over sampled parameters and uncertainties. In particular, the computed expectation of the SP-DPC problem's value function is backpropagated through the closed-loop system rollouts parametrized by a known nominal system dynamics model and neural control policy which allows for direct model-based policy optimization. We provide theoretical probabilistic guarantees for policies learned via the SP-DPC method on closed-loop stability and chance constraints satisfaction. Furthermore, we demonstrate the computational efficiency and scalability of the proposed policy optimization algorithm in three numerical examples, including systems with a large number of states or subject to nonlinear constraints.