论文标题
学习很难改变的解释
Learning explanations that are hard to vary
论文作者
论文摘要
在本文中,我们调查了以下原则:在深度学习的背景下,“良好的解释很难变化”。我们表明,跨示例的平均梯度(类似于逻辑或模式)可以偏爱记忆和“拼布”解决方案,这些解决方案将不同的策略缝制在一起,而不是识别侵犯。要检查这一点,我们首先将损失面的最小值的一致性概念形式化,该概念仅在汇总示例时才能在多大程度上出现最小值。然后,我们提出并实验验证了一种基于逻辑的简单替代算法,并着重于不向导,并防止在一组真实世界任务中的记忆。最后,使用一个合成数据集,在不变和伪造机制之间具有明确区分的合成数据集,我们剖析了学习信号,并将这种方法与已建立的正规化器进行比较。
In this paper, we investigate the principle that `good explanations are hard to vary' in the context of deep learning. We show that averaging gradients across examples -- akin to a logical OR of patterns -- can favor memorization and `patchwork' solutions that sew together different strategies, instead of identifying invariances. To inspect this, we first formalize a notion of consistency for minima of the loss surface, which measures to what extent a minimum appears only when examples are pooled. We then propose and experimentally validate a simple alternative algorithm based on a logical AND, that focuses on invariances and prevents memorization in a set of real-world tasks. Finally, using a synthetic dataset with a clear distinction between invariant and spurious mechanisms, we dissect learning signals and compare this approach to well-established regularizers.