超越排行榜：揭示自然语言推理弱点的方法调查

论文标题

超越排行榜：揭示自然语言推理弱点的方法调查

Beyond Leaderboards: A survey of methods for revealing weaknesses in Natural Language Inference data and models

论文作者

Schlegel, Viktor, Nenadic, Goran, Batista-Navarro, Riza

论文摘要

近年来，越来越多的出版物分析了自然语言推断（NLI）数据集的表面提示，它们是否破坏了这些数据集基础的任务的复杂性，以及它们如何影响对该数据进行优化和评估的模型。这项结构化的调查通过对模型和数据集的报告弱点进行分类以及建议揭示和减轻英语弱点的方法来概述不断发展的研究领域。我们总结并讨论了发现，并以一组可能的未来研究方向的建议结束。我们希望对于提出新数据集的研究人员来说，这将是一个有用的资源，他们拥有一组工具来评估其数据的适用性和质量，以评估感兴趣的各种现象，以及那些开发新型体系结构的现象，以进一步了解其对模型所获得的功能的改善的含义。

Recent years have seen a growing number of publications that analyse Natural Language Inference (NLI) datasets for superficial cues, whether they undermine the complexity of the tasks underlying those datasets and how they impact those models that are optimised and evaluated on this data. This structured survey provides an overview of the evolving research area by categorising reported weaknesses in models and datasets and the methods proposed to reveal and alleviate those weaknesses for the English language. We summarise and discuss the findings and conclude with a set of recommendations for possible future research directions. We hope it will be a useful resource for researchers who propose new datasets, to have a set of tools to assess the suitability and quality of their data to evaluate various phenomena of interest, as well as those who develop novel architectures, to further understand the implications of their improvements with respect to their model's acquired capabilities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题