利用子模数值函数来扩展主动感知

论文标题

利用子模数值函数来扩展主动感知

Exploiting Submodular Value Functions For Scaling Up Active Perception

论文作者

Satsangi, Yash, Whiteson, Shimon, Oliehoek, Frans A., Spaan, Matthijs T. J.

论文摘要

在主动感知任务中，代理商旨在选择减少其一个或多个隐藏变量的不确定性的感觉动作。尽管可观察到的马尔可夫决策过程（POMDP）为此类问题提供了自然模型，但奖励功能直接惩罚了代理商信念中的不确定性，可以消除大多数POMDP计划者所需的价值功能的分段线性和传达属性。此外，随着代理可用的传感器数量的增长，POMDP规划的计算成本随之成倍增长，使POMDP计划与传统方法不可行。在本文中，我们应对积极感知任务建模和计划的双重挑战。我们显示了$ρ$ POMDP和POMDP-IR的数学等效性，这是建模有效感知任务的两个框架，这些框架还原值函数的PWLC属性。为了有效地计划主动感知任务，我们识别和利用POMDP-IR的独立性，以减少求解POMDP-IR（和$ρ$ POMDP）的计算成本。我们提出了一种基于贪婪的点价值迭代（PBVI），这是一种新的POMDP计划方法，它使用贪婪的最大化来大大提高主动感知POMDP的动作空间中的可扩展性。此外，我们表明，在某些条件下，包括子二次性，使用贪婪PBVI计算的值函数可确保相对于最佳值函数有界限。我们确定了积极感知POMDP的价值函数的条件。最后，我们对从购物中心采用的多相机跟踪系统收集的数据集进行了详细的经验分析。我们的方法与现有方法的性能相似，但在计算成本的一小部分中，可以更好地解决主动感知任务。

In active perception tasks, an agent aims to select sensory actions that reduce its uncertainty about one or more hidden variables. While partially observable Markov decision processes (POMDPs) provide a natural model for such problems, reward functions that directly penalize uncertainty in the agent's belief can remove the piecewise-linear and convex property of the value function required by most POMDP planners. Furthermore, as the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially with it, making POMDP planning infeasible with traditional methods. In this article, we address a twofold challenge of modeling and planning for active perception tasks. We show the mathematical equivalence of $ρ$POMDP and POMDP-IR, two frameworks for modeling active perception tasks, that restore the PWLC property of the value function. To efficiently plan for active perception tasks, we identify and exploit the independence properties of POMDP-IR to reduce the computational cost of solving POMDP-IR (and $ρ$POMDP). We propose greedy point-based value iteration (PBVI), a new POMDP planning method that uses greedy maximization to greatly improve scalability in the action space of an active perception POMDP. Furthermore, we show that, under certain conditions, including submodularity, the value function computed using greedy PBVI is guaranteed to have bounded error with respect to the optimal value function. We establish the conditions under which the value function of an active perception POMDP is guaranteed to be submodular. Finally, we present a detailed empirical analysis on a dataset collected from a multi-camera tracking system employed in a shopping mall. Our method achieves similar performance to existing methods but at a fraction of the computational cost leading to better scalability for solving active perception tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题