一种正式的解释性方法

论文标题

一种正式的解释性方法

A Formal Approach to Explainability

论文作者

Wolf, Lior, Galanti, Tomer, Hazan, Tamir

论文摘要

我们将解释视为输入样本和模型输出的混合，并提供了一些定义，这些定义捕获了生成这些解释的函数的各种所需属性。我们研究了这些属性之间的联系以及解释生成的函数和学习模型的中间表示，并能够证明，例如，如果给定层的激活与解释是一致的，那么所有其他后续层也是如此。此外，我们研究解释的交集和结合是构建新解释的一种方式。

We regard explanations as a blending of the input sample and the model's output and offer a few definitions that capture various desired properties of the function that generates these explanations. We study the links between these properties and between explanation-generating functions and intermediate representations of learned models and are able to show, for example, that if the activations of a given layer are consistent with an explanation, then so do all other subsequent layers. In addition, we study the intersection and union of explanations as a way to construct new explanations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题