用于窥视黑匣子预测模型的因果镜头：通过因果归因的预测模型解释

论文标题

用于窥视黑匣子预测模型的因果镜头：通过因果归因的预测模型解释

A Causal Lens for Peeking into Black Box Predictive Models: Predictive Model Interpretation via Causal Attribution

论文作者

Khademi, Aria, Honavar, Vasant

论文摘要

随着在广泛的高风险应用程序（例如医疗保健，安全，刑事司法，财务和教育）中使用机器学习培训的预测模型的越来越多，对解释此类模型及其预测的有效技术的需求越来越大。我们的目标是在预测模型是黑匣子的设置中解决这个问题。也就是说，我们只能观察模型对各种输入的响应，但对预测模型的内部结构，其参数，目标函数和用于优化模型的算法不了解。我们将将黑匣子预测模型解释的问题减少到估计每个模型输入对模型输出的因果效应的问题，从模型输入和相应的输出的观察值。我们使用鲁宾·尼曼（Rubin Neyman）潜在结果框架的变体来估算模型输入对模型输出的因果影响，以估算观察数据的因果效应。我们展示了模型输出对不同模型输入的责任责任的因果因果归因如何用于解释预测模型并解释其预测。我们提出了实验的结果，这些结果证明了我们通过因果归因来解释黑匣子预测模型的有效性，而在一个基于一个合成数据集的深度神经网络模型的情况下（影响输出变量的输入变量是由设计已知的）和两个现实世界数据集的：手绘数字分类和帕金森氏病的严重性预测。因为我们的方法不需要有关预测模型算法的知识，并且没有关于黑匣子预测模型的假设，除非可以观察到其投入输出响应，否则可以将其应用于任何黑匣子预测模型。

With the increasing adoption of predictive models trained using machine learning across a wide range of high-stakes applications, e.g., health care, security, criminal justice, finance, and education, there is a growing need for effective techniques for explaining such models and their predictions. We aim to address this problem in settings where the predictive model is a black box; That is, we can only observe the response of the model to various inputs, but have no knowledge about the internal structure of the predictive model, its parameters, the objective function, and the algorithm used to optimize the model. We reduce the problem of interpreting a black box predictive model to that of estimating the causal effects of each of the model inputs on the model output, from observations of the model inputs and the corresponding outputs. We estimate the causal effects of model inputs on model output using variants of the Rubin Neyman potential outcomes framework for estimating causal effects from observational data. We show how the resulting causal attribution of responsibility for model output to the different model inputs can be used to interpret the predictive model and to explain its predictions. We present results of experiments that demonstrate the effectiveness of our approach to the interpretation of black box predictive models via causal attribution in the case of deep neural network models trained on one synthetic data set (where the input variables that impact the output variable are known by design) and two real-world data sets: Handwritten digit classification, and Parkinson's disease severity prediction. Because our approach does not require knowledge about the predictive model algorithm and is free of assumptions regarding the black box predictive model except that its input-output responses be observable, it can be applied, in principle, to any black box predictive model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题