深度学习解释方法的显着性图评估指标

论文标题

深度学习解释方法的显着性图评估指标

Metrics for saliency map evaluation of deep learning explanation methods

论文作者

Gomez, Tristan, Fréour, Thomas, Mouchère, Harold

论文摘要

由于深度学习模型的黑盒性质，最近有针对CNN的视觉解释的解决方案的开发。鉴于用户研究的高成本，对于比较和评估这些不同的方法是必要的。在本文中，我们严格分析了Petsiuk等人提出的曲线（IAUC）指标下曲线（DAUC）和插入区域下的缺失区域。（2018）。这些指标旨在评估通过Grad-CAM或Rise等通用方法产生的显着图的忠诚。首先，我们表明，由于仅考虑了分数的排名，因此忽略了显着性图的实际显着分数值。这表明这些指标本身不足，因为显着图的视觉外观可能会发生很大变化，而无需修改分数的排名。其次，我们认为在DAUC和IAUC的计算过程中，该模型被呈现出来自训练分布的图像，这些图像可能导致所解释的模型的不可靠行为。为了补充DAUC/IAUC，我们提出了量化解释方法的稀疏性和校准的新指标，这是两个以前未研究的特性。最后，我们对本文研究的指标进行了一般性评论，并讨论了如何在用户研究中评估它们。

Due to the black-box nature of deep learning models, there is a recent development of solutions for visual explanations of CNNs. Given the high cost of user studies, metrics are necessary to compare and evaluate these different methods. In this paper, we critically analyze the Deletion Area Under Curve (DAUC) and Insertion Area Under Curve (IAUC) metrics proposed by Petsiuk et al. (2018). These metrics were designed to evaluate the faithfulness of saliency maps generated by generic methods such as Grad-CAM or RISE. First, we show that the actual saliency score values given by the saliency map are ignored as only the ranking of the scores is taken into account. This shows that these metrics are insufficient by themselves, as the visual appearance of a saliency map can change significantly without the ranking of the scores being modified. Secondly, we argue that during the computation of DAUC and IAUC, the model is presented with images that are out of the training distribution which might lead to an unreliable behavior of the model being explained. To complement DAUC/IAUC, we propose new metrics that quantify the sparsity and the calibration of explanation methods, two previously unstudied properties. Finally, we give general remarks about the metrics studied in this paper and discuss how to evaluate them in a user study.

下载PDF全文

下载文献需遵守相关版权规定

论文标题