通过意图描述生成查询理解

论文标题

通过意图描述生成查询理解

Query Understanding via Intent Description Generation

论文作者

Zhang, Ruqing, Guo, Jiafeng, Fan, Yixing, Lan, Yanyan, Cheng, Xueqi

论文摘要

查询理解是信息检索（IR）中的一个基本问题，在过去的几十年中，它引起了不断的关注。已经提出了许多不同的任务，以了解用户的搜索查询，例如查询分类或查询群集。但是，由于丢失了许多详细信息，在意图类/群集级别上了解搜索查询并不是那么精确。正如我们在许多基准数据集（例如TREC和Semeval）中可能发现的那样，查询通常与人类注释者提供的详细描述相关联，该描述清楚地描述了其目的是帮助评估文档的相关性。如果系统能够自动为搜索查询（如人类注释器）生成详细而精确的意图描述，这将表明已经实现了更好的查询理解。因此，在本文中，我们提出了一个新颖的查询描述（Q2ID）任务，以了解查询理解。与那些利用查询及其描述来计算文档相关性的现有排名任务不同，Q2ID是一个反向任务，旨在基于给定查询的相关和无关的文档生成自然语言意图描述。为了解决这一新任务，我们提出了一个新颖的对比生成模型，即简称CTRSGEN，以通过将相关文档与查询无关的文档进行对比来生成意图描述。我们通过与Q2ID任务上的几个最新一代模型进行比较来证明我们的模型的有效性。我们通过示例应用程序讨论了这种Q2ID技术的潜在用法。

Query understanding is a fundamental problem in information retrieval (IR), which has attracted continuous attention through the past decades. Many different tasks have been proposed for understanding users' search queries, e.g., query classification or query clustering. However, it is not that precise to understand a search query at the intent class/cluster level due to the loss of many detailed information. As we may find in many benchmark datasets, e.g., TREC and SemEval, queries are often associated with a detailed description provided by human annotators which clearly describes its intent to help evaluate the relevance of the documents. If a system could automatically generate a detailed and precise intent description for a search query, like human annotators, that would indicate much better query understanding has been achieved. In this paper, therefore, we propose a novel Query-to-Intent-Description (Q2ID) task for query understanding. Unlike those existing ranking tasks which leverage the query and its description to compute the relevance of documents, Q2ID is a reverse task which aims to generate a natural language intent description based on both relevant and irrelevant documents of a given query. To address this new task, we propose a novel Contrastive Generation model, namely CtrsGen for short, to generate the intent description by contrasting the relevant documents with the irrelevant documents given a query. We demonstrate the effectiveness of our model by comparing with several state-of-the-art generation models on the Q2ID task. We discuss the potential usage of such Q2ID technique through an example application.

下载PDF全文

下载文献需遵守相关版权规定

论文标题