论文标题

通过共同提取句子和关键字,无监督的摘要

Unsupervised Summarization by Jointly Extracting Sentences and Keywords

论文作者

Li, Zongyi, Zheng, Xiaoqing, He, Jun

论文摘要

我们提出了ReTrank,这是一种无监督的基于图的排名模型,用于提取性多文档摘要,其中单词,句子和单词句子之间的相似性可以通过其统一向量空间中其向量表示之间的距离来估算。为了获得理想的表示形式,我们提出了一种基于自我注意的学习方法,该方法代表句子嵌入的加权总和,而权重集中在这些词上,希望可以更好地反映文档的内容。我们表明,可以使用我们学到的表示形式在联合和相互加强过程中提取显着句子和关键字,并证明该过程始终收敛到独特的解决方案,从而改善了性能。还描述了一种吸收随机步行和相应采样算法的变体,以避免摘要的冗余性和增加多样性。通过多个基准数据集的实验结果表明,在胭脂中逐渐达到了最佳或可比的性能。

We present RepRank, an unsupervised graph-based ranking model for extractive multi-document summarization in which the similarity between words, sentences, and word-to-sentence can be estimated by the distances between their vector representations in a unified vector space. In order to obtain desirable representations, we propose a self-attention based learning method that represent a sentence by the weighted sum of its word embeddings, and the weights are concentrated to those words hopefully better reflecting the content of a document. We show that salient sentences and keywords can be extracted in a joint and mutual reinforcement process using our learned representations, and prove that this process always converges to a unique solution leading to improvement in performance. A variant of absorbing random walk and the corresponding sampling-based algorithm are also described to avoid redundancy and increase diversity in the summaries. Experiment results with multiple benchmark datasets show that RepRank achieved the best or comparable performance in ROUGE.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源