论文标题
适用于不确定POMDPS的强大有限状态控制器
Robust Finite-State Controllers for Uncertain POMDPs
论文作者
论文摘要
不确定的部分可观察到的马尔可夫决策过程(UPOMDP)允许标准POMDP的概率过渡和观察功能属于所谓的不确定性集。这种不确定性(称为认知不确定性)捕获了由缺乏可用数据引起的一组概率分布集。我们开发了一种算法来计算upomdps的有限内存策略,该策略可牢固地满足任何可接受分布的规格。通常,计算这种策略在理论上和实际上是棘手的。我们通过四个步骤为此问题提供有效的解决方案。 (1)我们将基本问题陈述为无限许多约束的非凸优化问题。 (2)专用二元化方案产生的双重问题仍然是非convex,但有限许多约束。 (3)我们将这个偶性问题线性化,(4)求解所得的有限线性程序,以获取针对原始问题的局部最佳解决方案。由此产生的问题公式比现有方法产生的问题要小。我们使用飞机避免碰撞的场景和新型的航天器运动计划案例研究的大量实例证明了算法的适用性。
Uncertain partially observable Markov decision processes (uPOMDPs) allow the probabilistic transition and observation functions of standard POMDPs to belong to a so-called uncertainty set. Such uncertainty, referred to as epistemic uncertainty, captures uncountable sets of probability distributions caused by, for instance, a lack of data available. We develop an algorithm to compute finite-memory policies for uPOMDPs that robustly satisfy specifications against any admissible distribution. In general, computing such policies is theoretically and practically intractable. We provide an efficient solution to this problem in four steps. (1) We state the underlying problem as a nonconvex optimization problem with infinitely many constraints. (2) A dedicated dualization scheme yields a dual problem that is still nonconvex but has finitely many constraints. (3) We linearize this dual problem and (4) solve the resulting finite linear program to obtain locally optimal solutions to the original problem. The resulting problem formulation is exponentially smaller than those resulting from existing methods. We demonstrate the applicability of our algorithm using large instances of an aircraft collision-avoidance scenario and a novel spacecraft motion planning case study.