模型修补：通过数据扩大缩小子组性能差距

论文标题

模型修补：通过数据扩大缩小子组性能差距

Model Patching: Closing the Subgroup Performance Gap with Data Augmentation

论文作者

Goel, Karan, Gu, Albert, Li, Yixuan, Ré, Christopher

论文摘要

部署时，机器学习中的分类器通常会很脆弱。特别是关于在某个类的特定亚组上表现不一致的模型，例如，在存在或不存在虚假绷带的情况下，在皮肤癌分类中表现出差异。为了减轻这些性能差异，我们介绍了模型修补程序，这是一个两阶段的框架，用于提高鲁棒性，鼓励模型不变到子组差异，并专注于子组共享的类信息。模型修补班级中的第一型模型子组功能，并学习它们之间的语义转换，然后使用故意操纵亚组功能的数据增强训练分类器。我们用骆驼实例化模型修补，（1）使用CycleGAN来学习课内，累加组间的增强，以及（2）使用理论上动机的亚组一致性正常器平衡亚组性能，并伴随着新的可靠目标。我们证明了骆驼在3个基准数据集上的有效性，相对于最佳基线，可靠误差最高为33％。最后，骆驼成功地修补了由于现实世界中皮肤癌数据集上虚假特征而失败的模型。

Classifiers in machine learning are often brittle when deployed. Particularly concerning are models with inconsistent performance on specific subgroups of a class, e.g., exhibiting disparities in skin cancer classification in the presence or absence of a spurious bandage. To mitigate these performance differences, we introduce model patching, a two-stage framework for improving robustness that encourages the model to be invariant to subgroup differences, and focus on class information shared by subgroups. Model patching first models subgroup features within a class and learns semantic transformations between them, and then trains a classifier with data augmentations that deliberately manipulate subgroup features. We instantiate model patching with CAMEL, which (1) uses a CycleGAN to learn the intra-class, inter-subgroup augmentations, and (2) balances subgroup performance using a theoretically-motivated subgroup consistency regularizer, accompanied by a new robust objective. We demonstrate CAMEL's effectiveness on 3 benchmark datasets, with reductions in robust error of up to 33% relative to the best baseline. Lastly, CAMEL successfully patches a model that fails due to spurious features on a real-world skin cancer dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题