新协议和文本需要数据收集的负面结果

论文标题

新协议和文本需要数据收集的负面结果

New Protocols and Negative Results for Textual Entailment Data Collection

论文作者

Bowman, Samuel R., Palomaki, Jennimaria, Soares, Livio Baldini, Pitler, Emily

论文摘要

事实证明，自然语言推论（NLI）数据已被证明在基准测试中有用，尤其是作为需要语言理解的任务的预处理数据。但是，用于收集此数据的众包协议已知已知问题，并且没有针对这两个目的明确优化，因此它可能远非理想。我们提出了四个替代方案，每种方案旨在提高注释者提供合理训练示例的便捷性或这些例子的质量和多样性。使用这些替代方案和第五基线协议，我们收集和比较了五个新的8.5K示例训练集。在关注转移学习应用程序的评估中，我们的结果是牢固的，在基准数据集中对模型进行了训练，可以为下游任务提供良好的传输性能，但是我们的四种新方法（也没有最近的ANLI）都显示出比该基线的任何改进。在一条小的一线希望中，我们观察到所有四个新协议，尤其是注释者编辑预填写的文本框的协议，都减少了先前观察到的注释工件的问题。

Natural language inference (NLI) data has proven useful in benchmarking and, especially, as pretraining data for tasks requiring language understanding. However, the crowdsourcing protocol that was used to collect this data has known issues and was not explicitly optimized for either of these purposes, so it is likely far from ideal. We propose four alternative protocols, each aimed at improving either the ease with which annotators can produce sound training examples or the quality and diversity of those examples. Using these alternatives and a fifth baseline protocol, we collect and compare five new 8.5k-example training sets. In evaluations focused on transfer learning applications, our results are solidly negative, with models trained on our baseline dataset yielding good transfer performance to downstream tasks, but none of our four new methods (nor the recent ANLI) showing any improvements over that baseline. In a small silver lining, we observe that all four new protocols, especially those where annotators edit pre-filled text boxes, reduce previously observed issues with annotation artifacts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题