论文标题

跨数据库设计讨论挖掘

Cross-Dataset Design Discussion Mining

论文作者

Mahadi, Alvi, Tongay, Karan, Ernst, Neil A.

论文摘要

能够识别主要是关于设计的软件讨论,我们称为设计挖掘,可以改善软件系统的文档和维护。现有的设计挖掘方法使用自然语言处理(NLP)技术具有良好的分类性能,但是这些方法的结论稳定性通常很差。到目前为止,在给定的软件项目数据集上培训的分类器在不同的工件或不同的数据集上还不够好。在这项研究中,我们在荟萃分析中复制并合成了这些早期结果。然后,我们将最新的NLP转移学习工作应用于设计挖掘的问题。但是,对于我们的数据集而言,这些深度转移学习分类器的表现并不比不那么复杂的分类器更好。我们通过讨论转移学习方法设计采矿的一些原因来结束。

Being able to identify software discussions that are primarily about design, which we call design mining, can improve documentation and maintenance of software systems. Existing design mining approaches have good classification performance using natural language processing (NLP) techniques, but the conclusion stability of these approaches is generally poor. A classifier trained on a given dataset of software projects has so far not worked well on different artifacts or different datasets. In this study, we replicate and synthesize these earlier results in a meta-analysis. We then apply recent work in transfer learning for NLP to the problem of design mining. However, for our datasets, these deep transfer learning classifiers perform no better than less complex classifiers. We conclude by discussing some reasons behind the transfer learning approach to design mining.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源