iMacs：图像模型归因比较摘要

论文标题

iMacs：图像模型归因比较摘要

IMACS: Image Model Attribution Comparison Summaries

论文作者

Schoop, Eldon, Wedin, Ben, Kapishnikov, Andrei, Bolukbasi, Tolga, Terry, Michael

论文摘要

开发合适的深神经网络（DNN）通常需要重大的迭代，在评估和比较不同的模型版本的情况下。尽管诸如准确性之类的指标是一种有力的手段，可以简洁地描述模型在数据集中的性能或直接比较模型版本，但从业人员通常希望更深入地了解影响模型预测的因素。可解释性技术（例如基于梯度的方法和局部近似值）可以用来详细检查一小部分输入集，但是很难确定小集合的结果是否在跨数据集中推广。我们介绍了iMacs，一种将基于梯度的模型属性与聚合和可视化技术相结合的方法，以总结两个DNN图像模型之间属性的差异。更具体地说，iMacs从评估数据集中提取出色的输入特征，根据相似性将它们簇簇，然后可视化相似输入特征的模型属性差异。在这项工作中，我们介绍了一个框架，用于汇总，总结和比较数据集的两个模型的归因信息；目前的可视化，突出了2个图像分类模型之间的差异；并展示我们的技术如何发现由卫星图像训练的两个模型之间的域移动引起的行为差异。

Developing a suitable Deep Neural Network (DNN) often requires significant iteration, where different model versions are evaluated and compared. While metrics such as accuracy are a powerful means to succinctly describe a model's performance across a dataset or to directly compare model versions, practitioners often wish to gain a deeper understanding of the factors that influence a model's predictions. Interpretability techniques such as gradient-based methods and local approximations can be used to examine small sets of inputs in fine detail, but it can be hard to determine if results from small sets generalize across a dataset. We introduce IMACS, a method that combines gradient-based model attributions with aggregation and visualization techniques to summarize differences in attributions between two DNN image models. More specifically, IMACS extracts salient input features from an evaluation dataset, clusters them based on similarity, then visualizes differences in model attributions for similar input features. In this work, we introduce a framework for aggregating, summarizing, and comparing the attribution information for two models across a dataset; present visualizations that highlight differences between 2 image classification models; and show how our technique can uncover behavioral differences caused by domain shift between two models trained on satellite images.

下载PDF全文

下载文献需遵守相关版权规定

论文标题