解码神经捕犬的潜在空间以查询建议

论文标题

解码神经捕犬的潜在空间以查询建议

Decoding a Neural Retriever's Latent Space for Query Suggestion

论文作者

Adolphs, Leonard, Huebscher, Michelle Chen, Buck, Christian, Girgin, Sertan, Bachem, Olivier, Ciaramita, Massimiliano, Hofmann, Thomas

论文摘要

神经检索模型已取代经典的词袋方法，例如BM25作为选择的检索框架。但是，神经系统缺乏单袋模型的解释性。将查询更改与潜在空间的变化联系起来并不是很微不足道的，该更改最终决定了检索结果。为了阐明这个嵌入空间，我们学习了一个“查询解码器”，鉴于神经搜索引擎的潜在表示，它会生成相应的查询。我们表明，可以从其潜在表示中解码有意义的查询，并且在潜在空间中正确的方向移动时，可以解码检索相关段落的查询。特别是，查询解码器可以理解“应该被要求”从集合中检索特定段落。我们采用查询解码器为MSMARCO生成了大量的查询重新汇总数据集，从而改善了检索性能。在此数据上，我们培训了一个伪相关反馈（PRF）T5模型，以应用查询建议，该建议表现优于查询重新印象和PRF信息检索基线。

Neural retrieval models have superseded classic bag-of-words methods such as BM25 as the retrieval framework of choice. However, neural systems lack the interpretability of bag-of-words models; it is not trivial to connect a query change to a change in the latent space that ultimately determines the retrieval results. To shed light on this embedding space, we learn a "query decoder" that, given a latent representation of a neural search engine, generates the corresponding query. We show that it is possible to decode a meaningful query from its latent representation and, when moving in the right direction in latent space, to decode a query that retrieves the relevant paragraph. In particular, the query decoder can be useful to understand "what should have been asked" to retrieve a particular paragraph from the collection. We employ the query decoder to generate a large synthetic dataset of query reformulations for MSMarco, leading to improved retrieval performance. On this data, we train a pseudo-relevance feedback (PRF) T5 model for the application of query suggestion that outperforms both query reformulation and PRF information retrieval baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题