不变的合理化

论文标题

不变的合理化

Invariant Rationalization

论文作者

Chang, Shiyu, Zhang, Yang, Yu, Mo, Jaakkola, Tommi S.

论文摘要

选择性合理化通过确定最能解释或支持预测的一小部分输入特征（理由）来改善神经网络的可解释性。一个典型的合理化标准，即最大互信息（MMI），发现仅基于理由的预测绩效最大化的基本原理。但是，MMI可能是有问题的，因为它在输入功能和输出之间拾取了虚假的相关性。取而代之的是，我们引入了游戏理论不变的合理化标准，其中限制理由以使相同的预测指标能够在不同的环境中是最佳的。我们从理论上和经验上都表明，提议的理由可以排除虚假的相关性，更好地推广到不同的测试场景，并且与人类判断更好。我们的数据和代码可用。

Selective rationalization improves neural network interpretability by identifying a small subset of input features -- the rationale -- that best explains or supports the prediction. A typical rationalization criterion, i.e. maximum mutual information (MMI), finds the rationale that maximizes the prediction performance based only on the rationale. However, MMI can be problematic because it picks up spurious correlations between the input features and the output. Instead, we introduce a game-theoretic invariant rationalization criterion where the rationales are constrained to enable the same predictor to be optimal across different environments. We show both theoretically and empirically that the proposed rationales can rule out spurious correlations, generalize better to different test scenarios, and align better with human judgments. Our data and code are available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题