论文标题

了解CPDP转移学习的自动参数优化:一项实证研究

Understanding the Automated Parameter Optimization on Transfer Learning for CPDP: An Empirical Study

论文作者

Li, Ke, Xiang, Zilin, Chen, Tao, Wang, Shuo, Tan, Kay Chen

论文摘要

数据驱动的缺陷预测在软件工程过程中变得越来越重要。由于来自软件项目的数据不足以培训可靠的缺陷预测模型,因此从其他项目借用数据/知识以促进当前项目的模型构建,即自然合理。大多数CPDP技术都涉及两个主要步骤,即转移学习和分类,每个步骤至少都需要调整一个参数以实现其最佳性能。这种实践非常适合自动参数优化的目的。但是,缺乏对自动参数优化对各种CPDP技术的影响是什么。在本文中,我们介绍了第一项实证研究,该研究对62种CPDP技术的影响进行了研究,其中13条是从现有CPDP文献中选择的,而其他49项则未探索。我们在20个具有不同量表和特征的现实世界软件项目上构建了缺陷预测模型。我们的发现表明:(1)自动参数优化基本上改善了具有可管理的计算成本的77 \%CPDP技术的缺陷预测性能。因此,在将来的CPDP研究中需要更多的努力。 (2)转移学习在CPDP中至关重要。鉴于计算预算紧张,专注于优化转移学习算法的参数配置更具成本效益(3)CPDP上的研究远非成熟,而通过结合现有转移学习和分类技术,找到更好的替代方案“不难”。这一发现提供了有关CPDP技术未来设计的重要见解。

Data-driven defect prediction has become increasingly important in software engineering process. Since it is not uncommon that data from a software project is insufficient for training a reliable defect prediction model, transfer learning that borrows data/knowledge from other projects to facilitate the model building at the current project, namely cross-project defect prediction (CPDP), is naturally plausible. Most CPDP techniques involve two major steps, i.e., transfer learning and classification, each of which has at least one parameter to be tuned to achieve their optimal performance. This practice fits well with the purpose of automated parameter optimization. However, there is a lack of thorough understanding about what are the impacts of automated parameter optimization on various CPDP techniques. In this paper, we present the first empirical study that looks into such impacts on 62 CPDP techniques, 13 of which are chosen from the existing CPDP literature while the other 49 ones have not been explored before. We build defect prediction models over 20 real-world software projects that are of different scales and characteristics. Our findings demonstrate that: (1) Automated parameter optimization substantially improves the defect prediction performance of 77\% CPDP techniques with a manageable computational cost. Thus more efforts on this aspect are required in future CPDP studies. (2) Transfer learning is of ultimate importance in CPDP. Given a tight computational budget, it is more cost-effective to focus on optimizing the parameter configuration of transfer learning algorithms (3) The research on CPDP is far from mature where it is "not difficult" to find a better alternative by making a combination of existing transfer learning and classification techniques. This finding provides important insights about the future design of CPDP techniques.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源