致力于提高NLP系统的选择性预测能力

论文标题

致力于提高NLP系统的选择性预测能力

Towards Improving Selective Prediction Ability of NLP Systems

论文作者

Varshney, Neeraj, Mishra, Swaroop, Baral, Chitta

论文摘要

最好说“我不能回答”，而不是错误地回答。这种选择性预测能力对于可靠地部署在现实世界应用程序中的NLP系统至关重要。先前的工作表明，现有的选择性预测技术无法表现良好，尤其是在室外设置中。在这项工作中，我们提出了一种方法，该方法通过使用预测置信度和实例的难度分数来校准模型来提高模型的概率估计。使用这两个信号，我们首先注释持有实例，然后训练校准器以预测模型预测的正确性。我们使用自然语言推理（NLI）和重复检测（DD）任务实例化方法，并在内域（IID）和室外（OOD）设置中对其进行评估。在（IID，OOD）设置中，我们表明，通过校准器学到的表示形式会改善（15.81％，5.64％）和（6.19％，13.9％），比“ MaxProb”（分别在NLI和DD任务上）的“ MaxProb”分别提高了。

It's better to say "I can't answer" than to answer incorrectly. This selective prediction ability is crucial for NLP systems to be reliably deployed in real-world applications. Prior work has shown that existing selective prediction techniques fail to perform well, especially in the out-of-domain setting. In this work, we propose a method that improves probability estimates of models by calibrating them using prediction confidence and difficulty score of instances. Using these two signals, we first annotate held-out instances and then train a calibrator to predict the likelihood of correctness of the model's prediction. We instantiate our method with Natural Language Inference (NLI) and Duplicate Detection (DD) tasks and evaluate it in both In-Domain (IID) and Out-of-Domain (OOD) settings. In (IID, OOD) settings, we show that the representations learned by our calibrator result in an improvement of (15.81%, 5.64%) and (6.19%, 13.9%) over 'MaxProb' -- a selective prediction baseline -- on NLI and DD tasks respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题