论文标题
解码神经捕犬的潜在空间以查询建议
Decoding a Neural Retriever's Latent Space for Query Suggestion
论文作者
论文摘要
神经检索模型已取代经典的词袋方法,例如BM25作为选择的检索框架。但是,神经系统缺乏单袋模型的解释性。将查询更改与潜在空间的变化联系起来并不是很微不足道的,该更改最终决定了检索结果。为了阐明这个嵌入空间,我们学习了一个“查询解码器”,鉴于神经搜索引擎的潜在表示,它会生成相应的查询。我们表明,可以从其潜在表示中解码有意义的查询,并且在潜在空间中正确的方向移动时,可以解码检索相关段落的查询。特别是,查询解码器可以理解“应该被要求”从集合中检索特定段落。我们采用查询解码器为MSMARCO生成了大量的查询重新汇总数据集,从而改善了检索性能。在此数据上,我们培训了一个伪相关反馈(PRF)T5模型,以应用查询建议,该建议表现优于查询重新印象和PRF信息检索基线。
Neural retrieval models have superseded classic bag-of-words methods such as BM25 as the retrieval framework of choice. However, neural systems lack the interpretability of bag-of-words models; it is not trivial to connect a query change to a change in the latent space that ultimately determines the retrieval results. To shed light on this embedding space, we learn a "query decoder" that, given a latent representation of a neural search engine, generates the corresponding query. We show that it is possible to decode a meaningful query from its latent representation and, when moving in the right direction in latent space, to decode a query that retrieves the relevant paragraph. In particular, the query decoder can be useful to understand "what should have been asked" to retrieve a particular paragraph from the collection. We employ the query decoder to generate a large synthetic dataset of query reformulations for MSMarco, leading to improved retrieval performance. On this data, we train a pseudo-relevance feedback (PRF) T5 model for the application of query suggestion that outperforms both query reformulation and PRF information retrieval baselines.