开放域密集检索的多视图文档表示学习

论文标题

开放域密集检索的多视图文档表示学习

Multi-View Document Representation Learning for Open-Domain Dense Retrieval

论文作者

Zhang, Shunyu, Liang, Yaobo, Gong, Ming, Jiang, Daxin, Duan, Nan

论文摘要

从大规模的文档集合中，大规模检索在第一阶段检索中取得了令人印象深刻的进步，该收集是建立在Bi-构架架构上，以产生查询和文档的单个向量表示。但是，文档通常可以从不同视图中回答多个潜在的查询。因此，文档的单个向量表示很难与多视图查询匹配，并且面临着语义不匹配问题。本文提出了一个多视图文档表示学习框架，旨在产生多视图嵌入以表示文档并强制执行它们以与不同的查询保持一致。首先，我们提出了一种简单而有效的方法，可以通过观众生成多个嵌入。其次，为了防止多视图嵌入到同一嵌入到同一嵌入，我们进一步提出了随着退火温度的全局本地损失，以鼓励多个观众更好地与不同的潜在查询保持一致。实验表明，我们的方法的表现优于最新作品，并取得了最先进的结果。

Dense retrieval has achieved impressive advances in first-stage retrieval from a large-scale document collection, which is built on bi-encoder architecture to produce single vector representation of query and document. However, a document can usually answer multiple potential queries from different views. So the single vector representation of a document is hard to match with multi-view queries, and faces a semantic mismatch problem. This paper proposes a multi-view document representation learning framework, aiming to produce multi-view embeddings to represent documents and enforce them to align with different queries. First, we propose a simple yet effective method of generating multiple embeddings through viewers. Second, to prevent multi-view embeddings from collapsing to the same one, we further propose a global-local loss with annealed temperature to encourage the multiple viewers to better align with different potential queries. Experiments show our method outperforms recent works and achieves state-of-the-art results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题