论文标题
因果匪徒而没有先验的知识使用分离集
Causal Bandits without prior knowledge using separating sets
论文作者
论文摘要
因果匪是经典匪徒问题的一种变体,在该问题中,代理必须在顺序决策过程中确定最佳动作,其中动作的奖励分布显示由因果模型控制的非平凡依赖性结构。到目前为止,文献中针对此问题提出的方法依赖于完整因果图的精确知识。我们制定了不一定依赖先前因果知识的新因果匪徒。相反,他们利用基于分离集的估计量,我们可以使用简单的条件独立性测试或因果发现方法找到。我们表明,鉴于一个真正的分离集,用于离散的I.I.D.数据,该估计器是公正的,并且具有差异,该方差在样本平均值的上限。我们分别基于Thompson采样和UCB开发算法,分别用于离散和高斯模型,并显示了模拟数据以及来自现实世界中蛋白质信号数据的匪徒的性能提高。
The Causal Bandit is a variant of the classic Bandit problem where an agent must identify the best action in a sequential decision-making process, where the reward distribution of the actions displays a non-trivial dependence structure that is governed by a causal model. Methods proposed for this problem thus far in the literature rely on exact prior knowledge of the full causal graph. We formulate new causal bandit algorithms that no longer necessarily rely on prior causal knowledge. Instead, they utilize an estimator based on separating sets, which we can find using simple conditional independence tests or causal discovery methods. We show that, given a true separating set, for discrete i.i.d. data, this estimator is unbiased, and has variance which is upper bounded by that of the sample mean. We develop algorithms based on Thompson Sampling and UCB for discrete and Gaussian models respectively and show increased performance on simulation data as well as on a bandit drawing from real-world protein signaling data.