WIC-TSV：在上下文中单词验证目标意义验证的评估基准

论文标题

WIC-TSV：在上下文中单词验证目标意义验证的评估基准

WiC-TSV: An Evaluation Benchmark for Target Sense Verification of Words in Context

论文作者

Breit, Anna, Revenko, Artem, Rezaee, Kiamehr, Pilehvar, Mohammad Taher, Camacho-Collados, Jose

论文摘要

我们提出WIC-TSV，这是一种新的多域评估基准，用于单词感官歧义。更具体地说，我们介绍了一个框架，以在上下文中对单词的目标意识验证，该框架将其在公式中作为二进制分类任务的独特性，从而独立于外部意义清单，以及对各个领域的覆盖范围。这使数据集高度灵活，以评估各种域中和跨域中的各种模型和系统。 WIC-TSV提供了三种不同的评估设置，具体取决于提供给模型的输入信号。我们使用最先进的语言模型在数据集上设置了基线性能。实验结果表明，即使这些模型可以在任务上表现出色，但机器和人类性能之间仍然存在差距，尤其是在室外设置中。 WIC-TSV数据可从https://competitions.codalab.org/competitions/23683获得

We present WiC-TSV, a new multi-domain evaluation benchmark for Word Sense Disambiguation. More specifically, we introduce a framework for Target Sense Verification of Words in Context which grounds its uniqueness in the formulation as a binary classification task thus being independent of external sense inventories, and the coverage of various domains. This makes the dataset highly flexible for the evaluation of a diverse set of models and systems in and across domains. WiC-TSV provides three different evaluation settings, depending on the input signals provided to the model. We set baseline performance on the dataset using state-of-the-art language models. Experimental results show that even though these models can perform decently on the task, there remains a gap between machine and human performance, especially in out-of-domain settings. WiC-TSV data is available at https://competitions.codalab.org/competitions/23683

下载PDF全文

下载文献需遵守相关版权规定

论文标题