预处理的变压器改善了分布的鲁棒性

论文标题

预处理的变压器改善了分布的鲁棒性

Pretrained Transformers Improve Out-of-Distribution Robustness

论文作者

Hendrycks, Dan, Liu, Xiaoyuan, Wallace, Eric, Dziedzic, Adam, Krishnan, Rishabh, Song, Dawn

论文摘要

尽管经过预定的变压器（例如BERT）在分布示例上具有很高的精度，但它们是否概括为新的分布？我们通过构建具有现实分布变化的新的鲁棒性基准来系统地测量七个NLP数据集的分布（OOD）概括。我们衡量了先前模型的概括，包括词袋模型，Convnet和LSTM，我们表明，经过预告片的变形金刚的性能下降大大较小。预处理的变压器在检测异常或OOD示例方面也更有效，而许多以前的模型通常比机会差。我们检查了哪些因素会影响鲁棒性，发现较大的模型不一定更健壮，蒸馏可能是有害的，并且更多样化的预处理数据可以增强鲁棒性。最后，我们展示了未来的工作可以改善OOD的鲁棒性。

Although pretrained Transformers such as BERT achieve high accuracy on in-distribution examples, do they generalize to new distributions? We systematically measure out-of-distribution (OOD) generalization for seven NLP datasets by constructing a new robustness benchmark with realistic distribution shifts. We measure the generalization of previous models including bag-of-words models, ConvNets, and LSTMs, and we show that pretrained Transformers' performance declines are substantially smaller. Pretrained transformers are also more effective at detecting anomalous or OOD examples, while many previous models are frequently worse than chance. We examine which factors affect robustness, finding that larger models are not necessarily more robust, distillation can be harmful, and more diverse pretraining data can enhance robustness. Finally, we show where future work can improve OOD robustness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题