论文标题
extraphrase:抽象性摘要的有效数据增强
ExtraPhrase: Efficient Data Augmentation for Abstractive Summarization
论文作者
论文摘要
经过大量平行数据训练的神经模型在抽象摘要任务中取得了令人印象深刻的表现。但是,大规模的平行语料库昂贵且构造挑战。在这项工作中,我们介绍了一种低成本和有效的策略,即临时策略,以增强培训数据以进行抽象性摘要任务。外构构建伪训练数据分为两个步骤:提取性摘要和释义。我们在提取性摘要步骤中提取输入文本的主要部分,并通过释义步骤获得其不同的表达式。通过实验,我们表明,与没有数据增强的设置相比,室外的抽象性汇总任务的性能提高了胭脂分数的0.50点以上。外域还优于现有方法,例如反向翻译和自我训练。我们还表明,当真正的培训数据的数量非常小时,即低资源设置时,室外层显着有效。此外,外层比现有方法更具成本效益。
Neural models trained with large amount of parallel data have achieved impressive performance in abstractive summarization tasks. However, large-scale parallel corpora are expensive and challenging to construct. In this work, we introduce a low-cost and effective strategy, ExtraPhrase, to augment training data for abstractive summarization tasks. ExtraPhrase constructs pseudo training data in two steps: extractive summarization and paraphrasing. We extract major parts of an input text in the extractive summarization step, and obtain its diverse expressions with the paraphrasing step. Through experiments, we show that ExtraPhrase improves the performance of abstractive summarization tasks by more than 0.50 points in ROUGE scores compared to the setting without data augmentation. ExtraPhrase also outperforms existing methods such as back-translation and self-training. We also show that ExtraPhrase is significantly effective when the amount of genuine training data is remarkably small, i.e., a low-resource setting. Moreover, ExtraPhrase is more cost-efficient than the existing approaches.