论文标题
通过凸优化的不确定POMDP的稳健策略合成
Robust Policy Synthesis for Uncertain POMDPs via Convex Optimization
论文作者
论文摘要
我们研究了不确定的马尔可夫决策过程(UPOMDPS)的政策综合问题。 UPOMDPS的过渡概率函数仅属于所谓的不确定性集,例如以概率间隔的形式。例如,由于对传感器的准确性不完善的知识,当代理在信息限制下运行时,就会出现这种模型。目的是计算代理商的策略,该策略与不确定性集中的所有可能概率分布相关。特别是,我们对一项稳健确保时间逻辑和预期奖励规格满意的政策感兴趣。我们将基本优化问题陈述为半无限次数二次限制的二次程序(QCQP),该计划具有有限的许多变量和无限的约束。由于QCQP通常是非凸的,并且实际上是不可解决的,因此我们求助于所谓的凸孔程序,以将QCQP凸出。即使凸面,所产生的优化问题仍然具有无限的许多约束,并且是NP-HARD。对于形成凸多属的不确定性集,我们将问题转换为具有许多约束的凸QCQP。我们通过几个案例研究来证明我们的方法的可行性,这些案例研究突出了我们问题的典型瓶颈。特别是,我们表明我们能够以数十万个州,数百种不同的观察结果来解决基准,并且我们研究了模型中不同不确定性的影响。
We study the problem of policy synthesis for uncertain partially observable Markov decision processes (uPOMDPs). The transition probability function of uPOMDPs is only known to belong to a so-called uncertainty set, for instance in the form of probability intervals. Such a model arises when, for example, an agent operates under information limitation due to imperfect knowledge about the accuracy of its sensors. The goal is to compute a policy for the agent that is robust against all possible probability distributions within the uncertainty set. In particular, we are interested in a policy that robustly ensures the satisfaction of temporal logic and expected reward specifications. We state the underlying optimization problem as a semi-infinite quadratically-constrained quadratic program (QCQP), which has finitely many variables and infinitely many constraints. Since QCQPs are non-convex in general and practically infeasible to solve, we resort to the so-called convex-concave procedure to convexify the QCQP. Even though convex, the resulting optimization problem still has infinitely many constraints and is NP-hard. For uncertainty sets that form convex polytopes, we provide a transformation of the problem to a convex QCQP with finitely many constraints. We demonstrate the feasibility of our approach by means of several case studies that highlight typical bottlenecks for our problem. In particular, we show that we are able to solve benchmarks with hundreds of thousands of states, hundreds of different observations, and we investigate the effect of different levels of uncertainty in the models.