精益生命：从解释中学习的标签有效注释框架

论文标题

精益生命：从解释中学习的标签有效注释框架

LEAN-LIFE: A Label-Efficient Annotation Framework Towards Learning from Explanation

论文作者

Lee, Dong-Ho, Khanna, Rahul, Lin, Bill Yuchen, Chen, Jamin, Lee, Seyeon, Ye, Qinyuan, Boschee, Elizabeth, Neves, Leonardo, Ren, Xiang

论文摘要

成功培训深度神经网络需要大量标记的数据。但是，每个标签仅提供有限的信息来学习和收集必要的标签涉及大量人类努力。在这项工作中，我们介绍了Lim-Life，这是一种基于网络的标签，有效的注释框架，用于序列标签和分类任务，并具有易于使用的UI，不仅允许注释者为任务提供所需的标签，而且还可以从每个标签决策的解释中学习。这种解释使我们能够从未标记的实例中生成有用的其他标记数据，从而加强了可用培训数据的池。在三个流行的NLP任务（命名实体识别，关系提取，情感分析）上，我们发现使用此增强的监督使我们的模型可以超过竞争性的基线F1分数超过5-10个百分点，而使用标记的实例则减少了2倍。我们的框架是第一个利用这种增强的监督技术的框架，并针对三个重要任务做到了这一点 - 从而为用户提供了改进的注释建议，并能够构建（数据，标签，说明）三倍而不是常规（数据，标签）对的数据集。

Successfully training a deep neural network demands a huge corpus of labeled data. However, each label only provides limited information to learn from and collecting the requisite number of labels involves massive human effort. In this work, we introduce LEAN-LIFE, a web-based, Label-Efficient AnnotatioN framework for sequence labeling and classification tasks, with an easy-to-use UI that not only allows an annotator to provide the needed labels for a task, but also enables LearnIng From Explanations for each labeling decision. Such explanations enable us to generate useful additional labeled data from unlabeled instances, bolstering the pool of available training data. On three popular NLP tasks (named entity recognition, relation extraction, sentiment analysis), we find that using this enhanced supervision allows our models to surpass competitive baseline F1 scores by more than 5-10 percentage points, while using 2X times fewer labeled instances. Our framework is the first to utilize this enhanced supervision technique and does so for three important tasks -- thus providing improved annotation recommendations to users and an ability to build datasets of (data, label, explanation) triples instead of the regular (data, label) pair.

下载PDF全文

下载文献需遵守相关版权规定

论文标题