强大而稳定的黑匣子解释

论文标题

强大而稳定的黑匣子解释

Robust and Stable Black Box Explanations

论文作者

Lakkaraju, Himabindu, Arsov, Nino, Bastani, Osbert

论文摘要

随着机器学习黑匣子越来越多地被部署在现实世界应用程序中，人们对开发事后解释的兴趣越来越多，以总结这些黑匣子的行为。但是，现有用于产生此类解释的算法已显示出缺乏分配变化的稳定性和稳健性。我们提出了一个新颖的框架，以基于对抗性训练的黑匣子模型产生强大而稳定的解释。我们的框架优化了一个最小目标，该目标旨在在一组对抗性扰动中就最差的案例构建最高的保真度解释。我们通过设计所需的优化程序来实例化该算法以线性模型和决策集的形式进行解释。据我们所知，这项工作首次尝试生成事后解释，这些解释对一般感兴趣的一般逆向扰动是强大的。使用现实世界和合成数据集的实验评估表明，我们的方法基本上提高了解释的鲁棒性，而不会牺牲它们对原始数据分布的忠诚。

As machine learning black boxes are increasingly being deployed in real-world applications, there has been a growing interest in developing post hoc explanations that summarize the behaviors of these black boxes. However, existing algorithms for generating such explanations have been shown to lack stability and robustness to distribution shifts. We propose a novel framework for generating robust and stable explanations of black box models based on adversarial training. Our framework optimizes a minimax objective that aims to construct the highest fidelity explanation with respect to the worst-case over a set of adversarial perturbations. We instantiate this algorithm for explanations in the form of linear models and decision sets by devising the required optimization procedures. To the best of our knowledge, this work makes the first attempt at generating post hoc explanations that are robust to a general class of adversarial perturbations that are of practical interest. Experimental evaluation with real-world and synthetic datasets demonstrates that our approach substantially improves robustness of explanations without sacrificing their fidelity on the original data distribution.

下载PDF全文

下载文献需遵守相关版权规定

论文标题