论文标题

欺骗性内核功能在离散POMDP的观察结果上

Deceptive Kernel Function on Observations of Discrete POMDP

论文作者

Zhang, Zhili, Zhu, Quanyan

论文摘要

本文研究了马尔可夫决策过程中对代理的欺骗。我们介绍了在离散POMDP中应用于代理观察结果的欺骗性内核功能(内核)。根据价值迭代,代理使用的三种特征算法的价值函数近似和POMCP,我们分析了其信念被伪造的观察结果误导为内核的输出,并预测其对代理商奖励的可能威胁以及可能的其他性能。我们通过对两个POMDP问题进行实验来验证我们的期望并探索欺骗的更有害影响。结果表明,应用于代理的观察的内核会影响其信念,并大大降低其产生的奖励。同时,某些内核的实施可能会引起代理商的其他异常行为。

This paper studies the deception applied on agent in a partially observable Markov decision process. We introduce deceptive kernel function (the kernel) applied to agent's observations in a discrete POMDP. Based on value iteration, value function approximation and POMCP three characteristic algorithms used by agent, we analyze its belief being misled by falsified observations as the kernel's outputs and anticipate its probable threat on agent's reward and potentially other performance. We validate our expectation and explore more detrimental effects of the deception by experimenting on two POMDP problems. The result shows that the kernel applied on agent's observation can affect its belief and substantially lower its resulting rewards; meantime certain implementation of the kernel could induce other abnormal behaviors by the agent.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源