关于软件缺陷预测中深度学习的过采样价值

论文标题

关于软件缺陷预测中深度学习的过采样价值

On the Value of Oversampling for Deep Learning in Software Defect Prediction

论文作者

Yedida, Rahul, Menzies, Tim

论文摘要

深度学习的一种真实性是，自动功能工程（在这些网络的第一层中看到）借口数据科学家在运行DL之前无法进行乏味的手动功能工程。对于缺陷预测深度学习的特定情况，我们表明Truism是错误的。具体而言，当我们采用一种称为模糊采样的新型过采样技术预处理数据，这是一条称为Ghost的较大管道的一部分（以目标为导向的超参数优化，用于可扩展训练），那么我们可以在14/20缺陷数据集中比以前的DL ART更明显地做到。我们的方法产生最先进的结果更快的深度学习者。这些结果列出了在对软件缺陷预测数据集应用深度学习之前使用过采样的有力案例。

One truism of deep learning is that the automatic feature engineering (seen in the first layers of those networks) excuses data scientists from performing tedious manual feature engineering prior to running DL. For the specific case of deep learning for defect prediction, we show that that truism is false. Specifically, when we preprocess data with a novel oversampling technique called fuzzy sampling, as part of a larger pipeline called GHOST (Goal-oriented Hyper-parameter Optimization for Scalable Training), then we can do significantly better than the prior DL state of the art in 14/20 defect data sets. Our approach yields state-of-the-art results significantly faster deep learners. These results present a cogent case for the use of oversampling prior to applying deep learning on software defect prediction datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题