论文标题
连续集成测试的机器学习测试案例的比较研究
Comparative Study of Machine Learning Test Case Prioritization for Continuous Integration Testing
论文作者
论文摘要
越来越多的研究表明,机器学习的潜力可以应对复杂的软件测试挑战。一个这样的挑战与连续集成测试有关,该测试是高度时间约束的,并从迭代代码提交和测试运行中产生了大量数据。在这种情况下,我们可以使用大量的测试数据进行训练机器学习预测变量,以识别能够加快代码集成过程中引入的回归错误的检测的测试用例。但是,不同的机器学习模型可以取决于上下文和连续集成测试的参数,例如可用于连续集成周期的可变时间预算或用于学习优先级失败测试用例的测试执行历史记录的大小。现有关于测试案例优先级的研究很少研究这两个因素,这对于持续整合实践至关重要。在这项研究中,我们对机器学习方法的故障预测性能进行了全面比较,该方法显示了文献中测试案例优先任务的最佳性能。我们评估了分类器在预测连续集成时间预算的不同值以及用于训练分类器的不同测试历史记录的不同值时评估分类器的准确性。在评估中,我们使用持续集成实践中的现实世界工业数据集。结果表明,不同的机器学习模型对于用于模型培训的不同测试历史记录以及用于测试案例执行的不同时间预算的不同性能。我们的结果暗示,应仔细配置在连续集成测试中进行测试优先级的机器学习方法,以实现最佳性能。
There is a growing body of research indicating the potential of machine learning to tackle complex software testing challenges. One such challenge pertains to continuous integration testing, which is highly time-constrained, and generates a large amount of data coming from iterative code commits and test runs. In such a setting, we can use plentiful test data for training machine learning predictors to identify test cases able to speed up the detection of regression bugs introduced during code integration. However, different machine learning models can have different fault prediction performance depending on the context and the parameters of continuous integration testing, for example variable time budget available for continuous integration cycles, or the size of test execution history used for learning to prioritize failing test cases. Existing studies on test case prioritization rarely study both of these factors, which are essential for the continuous integration practice. In this study we perform a comprehensive comparison of the fault prediction performance of machine learning approaches that have shown the best performance on test case prioritization tasks in the literature. We evaluate the accuracy of the classifiers in predicting fault-detecting tests for different values of the continuous integration time budget and with different length of test history used for training the classifiers. In evaluation, we use real-world industrial datasets from a continuous integration practice. The results show that different machine learning models have different performance for different size of test history used for model training and for different time budget available for test case execution. Our results imply that machine learning approaches for test prioritization in continuous integration testing should be carefully configured to achieve optimal performance.