论文标题
屏蔽资源受限的目标POMDP
Shielding in Resource-Constrained Goal POMDPs
论文作者
论文摘要
我们考虑需要部分可观察到的马尔可夫决策过程(POMDP)对需要供应某个资源(例如,电池中存储的电力)的代理进行建模才能正确运行。资源由代理商的行为消耗,只能在某些州补充。代理商旨在最大程度地减少达到目标的预期成本,同时防止资源耗尽,这是我们称为\ emph {资源约束目标优化}(RSGO)的问题。我们针对RSGO问题采用了两步方法。首先,使用形式方法技术,我们为给定的方案设计了一个算法计算\ emph {shield}:一个过程,该过程可观察代理并防止其使用可能最终导致资源耗尽的动作。其次,我们通过我们的盾牌增强了POMDP计划的POMCP启发式搜索算法,以获取解决RSGO问题的算法。我们实施算法,并目前的实验显示了其适用于文献的基准。
We consider partially observable Markov decision processes (POMDPs) modeling an agent that needs a supply of a certain resource (e.g., electricity stored in batteries) to operate correctly. The resource is consumed by agent's actions and can be replenished only in certain states. The agent aims to minimize the expected cost of reaching some goal while preventing resource exhaustion, a problem we call \emph{resource-constrained goal optimization} (RSGO). We take a two-step approach to the RSGO problem. First, using formal methods techniques, we design an algorithm computing a \emph{shield} for a given scenario: a procedure that observes the agent and prevents it from using actions that might eventually lead to resource exhaustion. Second, we augment the POMCP heuristic search algorithm for POMDP planning with our shields to obtain an algorithm solving the RSGO problem. We implement our algorithm and present experiments showing its applicability to benchmarks from the literature.