论文标题
预测开源项目的健康指标(使用超参数优化)
Predicting Health Indicators for Open Source Projects (using Hyperparameter Optimization)
论文作者
论文摘要
在公共平台上开发的软件是数据来源,可用于对这些项目进行预测。尽管个人发展活动可能是随机的且难以预测的,但是当大量开发人员共同努力在软件项目上时,可以很好地预测项目级别上的发展行为。 为了证明这一点,我们使用来自1,159个GitHub项目的64,181个月的数据来对这些项目的最新状况做出各种预测(截至2020年4月)。我们发现传统的估计算法犯了许多错误。诸如$ k $ neart邻居(KNN),支持矢量回归(SVR),随机森林(RFT),线性回归(LNR)和回归树(CART)之类的算法。但是,使用超参数优化可以大大降低该错误率。 据我们所知,这是迄今为止进行的最大研究,使用最新数据来预测开源项目的多个健康指标。
Software developed on public platform is a source of data that can be used to make predictions about those projects. While the individual developing activity may be random and hard to predict, the developing behavior on project level can be predicted with good accuracy when large groups of developers work together on software projects. To demonstrate this, we use 64,181 months of data from 1,159 GitHub projects to make various predictions about the recent status of those projects (as of April 2020). We find that traditional estimation algorithms make many mistakes. Algorithms like $k$-nearest neighbors (KNN), support vector regression (SVR), random forest (RFT), linear regression (LNR), and regression trees (CART) have high error rates. But that error rate can be greatly reduced using hyperparameter optimization. To the best of our knowledge, this is the largest study yet conducted, using recent data for predicting multiple health indicators of open-source projects.