论文标题
使用提取性摘要技术自动生成发行说明
Towards Automatically Generating Release Notes using Extractive Summarization Technique
论文作者
论文摘要
从业人员将发行说明作为基本文件接纳。它们包含有关软件发行的源代码更改的摘要,例如问题修复,添加了新功能和性能改进。手动制作发行说明是一项耗时且具有挑战性的任务。因此,有时开发人员会忽略编写发行说明。例如,我们从GitHub收集了超过1,900个发行版的数据,其中37%的发行说明是空的。我们根据提交消息提出了一种自动生成的发行说明方法,并合并了引物(PR)标题,以减轻此问题。我们实施了流行的提取文本摘要技术之一,即Textrank算法。但是,准确的关键字提取是文本处理中的至关重要问题。 Textrank算法的关键字匹配和主题提取过程忽略了文本之间的语义相似性。为了改善关键字提取方法,我们将手套单词嵌入技术与Textrank集成在一起。我们开发了一个具有1,213个发行说明(无效过滤后)的数据集,并通过Rouge Metric和Human评估来评估生成的发行说明。我们还将技术的性能与另一种流行的提取算法,潜在的语义分析(LSA)进行了比较。我们的评估结果表明,改进的Textrank方法的表现优于LSA。
Release notes are admitted as an essential document by practitioners. They contain the summary of the source code changes for the software releases, such as issue fixes, added new features, and performance improvements. Manually producing release notes is a time-consuming and challenging task. For that reason, sometimes developers neglect to write release notes. For example, we collect data from GitHub with over 1,900 releases, among them 37% of the release notes are empty. We propose an automatic generate release notes approach based on the commit messages and merge pull-request (PR) titles to mitigate this problem. We implement one of the popular extractive text summarization techniques, i.e., the TextRank algorithm. However, accurate keyword extraction is a vital issue in text processing. The keyword matching and topic extraction process of the TextRank algorithm ignores the semantic similarity among texts. To improve the keyword extraction method, we integrate the GloVe word embedding technique with TextRank. We develop a dataset with 1,213 release notes (after null filtering) and evaluate the generated release notes through the ROUGE metric and human evaluation. We also compare the performance of our technique with another popular extractive algorithm, latent semantic analysis (LSA). Our evaluation results show that the improved TextRank method outperforms LSA.