论文标题
使用中介分析对因果变量进行强化学习
Reinforcement Learning of Causal Variables Using Mediation Analysis
论文作者
论文摘要
机器学习中的许多开放问题与因果关系本质上相关,但是,在机器学习中使用因果分析仍处于早期阶段。在一般的强化学习环境中,我们考虑了建立一般加强学习代理的问题,该技术利用经验来构建环境的因果图,并使用此图来告知其政策。我们的方法具有三个特征:首先,我们学习了一个简单,粗粒的因果图,其中变量在许多时间实例上反映了状态,并且干预措施发生在政策层面上,而不是个人行动。其次,我们使用中介分析获得优化目标。通过最小化该目标,我们定义了因果变量。第三,我们的方法依赖于估计有条件的期望,而不是增强学习中熟悉的预期回报,因此我们应用了贝尔曼方程的概括。我们表明该方法可以在网格世界环境中学习合理的因果图,并且在使用因果关系策略时,代理商会提高性能。据我们所知,这是在强化学习环境中应用因果分析的第一次尝试,而无需严格限制国家的数量。我们已经观察到,调解分析为将因果收购问题转化为成本功能最小化之一提供了有希望的途径,但重要的是涉及估计条件期望的问题。这是一个新的挑战,我们认为因果增强学习将涉及适合在线估计此条件期望的开发方法。最后,我们方法的好处是使用非常简单的因果模型,这可以说是人类因果理解的更自然的模型。
Many open problems in machine learning are intrinsically related to causality, however, the use of causal analysis in machine learning is still in its early stage. Within a general reinforcement learning setting, we consider the problem of building a general reinforcement learning agent which uses experience to construct a causal graph of the environment, and use this graph to inform its policy. Our approach has three characteristics: First, we learn a simple, coarse-grained causal graph, in which the variables reflect states at many time instances, and the interventions happen at the level of policies, rather than individual actions. Secondly, we use mediation analysis to obtain an optimization target. By minimizing this target, we define the causal variables. Thirdly, our approach relies on estimating conditional expectations rather the familiar expected return from reinforcement learning, and we therefore apply a generalization of Bellman's equations. We show the method can learn a plausible causal graph in a grid-world environment, and the agent obtains an improvement in performance when using the causally informed policy. To our knowledge, this is the first attempt to apply causal analysis in a reinforcement learning setting without strict restrictions on the number of states. We have observed that mediation analysis provides a promising avenue for transforming the problem of causal acquisition into one of cost-function minimization, but importantly one which involves estimating conditional expectations. This is a new challenge, and we think that causal reinforcement learning will involve development methods suited for online estimation of such conditional expectations. Finally, a benefit of our approach is the use of very simple causal models, which are arguably a more natural model of human causal understanding.