大规模学习短语的密集表示

论文标题

大规模学习短语的密集表示

Learning Dense Representations of Phrases at Scale

论文作者

Lee, Jinhyuk, Sung, Mujeen, Kang, Jaewoo, Chen, Danqi

论文摘要

可以将开放域的问题回答作为一个短语检索问题进行重新校正，而无需在推断期间按需处理文档（Seo等，2019）。但是，当前的短语检索模型在很大程度上取决于稀疏表示形式，并且仍然表现不佳的猎犬阅读器方法。在这项工作中，我们首次展示了我们可以仅仅学习短语的密集表示，这些短语在开放域QA中取得了更强的性能。我们提出了一种从阅读理解任务的监督以及新颖的负抽样方法中学习短语表示的有效方法。我们还提出了一个查询侧微调策略，该策略可以支持转移学习并减少培训和推理之间的差异。在五个流行的开放域QA数据集中，我们的模型密度拼写量比以前的短语检索模型提高了15％-25％的绝对精度，并匹配了最先进的猎犬阅读器模型的性能。由于CPU上的纯净表示和过程，我们的模型易于并行化。最后，我们直接将预先指定的密集短语表示形式用于两个插槽填充任务，这表明了利用密集词作为下游任务的密集知识基础的希望。

Open-domain question answering can be reformulated as a phrase retrieval problem, without the need for processing documents on-demand during inference (Seo et al., 2019). However, current phrase retrieval models heavily depend on sparse representations and still underperform retriever-reader approaches. In this work, we show for the first time that we can learn dense representations of phrases alone that achieve much stronger performance in open-domain QA. We present an effective method to learn phrase representations from the supervision of reading comprehension tasks, coupled with novel negative sampling methods. We also propose a query-side fine-tuning strategy, which can support transfer learning and reduce the discrepancy between training and inference. On five popular open-domain QA datasets, our model DensePhrases improves over previous phrase retrieval models by 15%-25% absolute accuracy and matches the performance of state-of-the-art retriever-reader models. Our model is easy to parallelize due to pure dense representations and processes more than 10 questions per second on CPUs. Finally, we directly use our pre-indexed dense phrase representations for two slot filling tasks, showing the promise of utilizing DensePhrases as a dense knowledge base for downstream tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题