在没有内域标签的情况下进行文本室外检测

论文标题

在没有内域标签的情况下进行文本室外检测

Towards Textual Out-of-Domain Detection without In-Domain Labels

论文作者

Jin, Di, Gao, Shuyang, Kim, Seokhwan, Liu, Yang, Hakkani-Tur, Dilek

论文摘要

在许多实际设置中，机器学习模型需要识别偏域（OOD）的用户输入，以免执行错误的操作。这项工作着重于一个充满挑战的OOD检测情况，在该情况下，无法访问内域数据的标签（例如，没有针对意图分类任务的意图标签）。为此，我们首先评估基于语言模型的不同方法，这些方法可以预测一系列令牌的可能性。此外，我们通过结合无监督的聚类和对比度学习，提出了一种基于新颖的表示学习方法，以便可以学习更好的OOD检测数据表示。通过广泛的实验，我们证明了这种方法可以极大地超过基于可能性的方法，甚至可以与具有标签信息的最先进的监督方法竞争。

In many real-world settings, machine learning models need to identify user inputs that are out-of-domain (OOD) so as to avoid performing wrong actions. This work focuses on a challenging case of OOD detection, where no labels for in-domain data are accessible (e.g., no intent labels for the intent classification task). To this end, we first evaluate different language model based approaches that predict likelihood for a sequence of tokens. Furthermore, we propose a novel representation learning based method by combining unsupervised clustering and contrastive learning so that better data representations for OOD detection can be learned. Through extensive experiments, we demonstrate that this method can significantly outperform likelihood-based methods and can be even competitive to the state-of-the-art supervised approaches with label information.

下载PDF全文

下载文献需遵守相关版权规定

论文标题