论文标题
质疑摘要数据集的有效性并提高其事实一致性
Questioning the Validity of Summarization Datasets and Improving Their Factual Consistency
论文作者
论文摘要
由于抽象性摘要系统的快速发展,摘要评估的主题最近引起了人们的关注。但是,该任务的表述相当模棱两可,语言和自然语言处理社区都没有成功地给出了共同商定的定义。由于缺乏定义明确的配方,大量流行的抽象摘要数据集的构建方式既不保证有效性,也不符合摘要的最重要标准之一:事实一致性。在本文中,我们通过结合最新的事实一致性模型来解决此问题,以确定流行摘要数据集中存在的有问题实例。我们发布了SummFC,这是一个具有改善的事实一致性的过滤摘要数据集,并证明在此数据集中训练的模型几乎在所有质量方面都可以提高性能。我们认为,我们的数据集应成为开发和评估摘要系统的有效基准。
The topic of summarization evaluation has recently attracted a surge of attention due to the rapid development of abstractive summarization systems. However, the formulation of the task is rather ambiguous, neither the linguistic nor the natural language processing community has succeeded in giving a mutually agreed-upon definition. Due to this lack of well-defined formulation, a large number of popular abstractive summarization datasets are constructed in a manner that neither guarantees validity nor meets one of the most essential criteria of summarization: factual consistency. In this paper, we address this issue by combining state-of-the-art factual consistency models to identify the problematic instances present in popular summarization datasets. We release SummFC, a filtered summarization dataset with improved factual consistency, and demonstrate that models trained on this dataset achieve improved performance in nearly all quality aspects. We argue that our dataset should become a valid benchmark for developing and evaluating summarization systems.