论文标题
一种正式的解释性方法
A Formal Approach to Explainability
论文作者
论文摘要
我们将解释视为输入样本和模型输出的混合,并提供了一些定义,这些定义捕获了生成这些解释的函数的各种所需属性。我们研究了这些属性之间的联系以及解释生成的函数和学习模型的中间表示,并能够证明,例如,如果给定层的激活与解释是一致的,那么所有其他后续层也是如此。此外,我们研究解释的交集和结合是构建新解释的一种方式。
We regard explanations as a blending of the input sample and the model's output and offer a few definitions that capture various desired properties of the function that generates these explanations. We study the links between these properties and between explanation-generating functions and intermediate representations of learned models and are able to show, for example, that if the activations of a given layer are consistent with an explanation, then so do all other subsequent layers. In addition, we study the intersection and union of explanations as a way to construct new explanations.