论文标题
适应不断发展的数据流的自适应XGBoost
Adaptive XGBoost for Evolving Data Streams
论文作者
论文摘要
增强是一种合奏方法,它以顺序的方式结合了基本模型,以实现高预测精度。一种基于这种集成方法的流行学习算法是极端的梯度提升(XGB)。我们提出了XGB的改编,以分类不断发展的数据流。在这种情况下,新数据随着时间的推移而到来,类和功能之间的关系可能会在过程中发生变化,从而表现出概念漂移。所提出的方法在新数据可用时从微型数据中创建了集合的新成员。最大的合奏大小是固定的,但是在达到此大小时学习不会停止,因为在新数据上更新了集合,以确保与当前概念保持一致。我们还探讨了概念漂移检测的使用来触发一种机制以更新集合。我们通过概念漂移测试了对真实和合成数据的方法,并将其与数据流的批处理信息和实例信息分类方法进行比较。
Boosting is an ensemble method that combines base models in a sequential manner to achieve high predictive accuracy. A popular learning algorithm based on this ensemble method is eXtreme Gradient Boosting (XGB). We present an adaptation of XGB for classification of evolving data streams. In this setting, new data arrives over time and the relationship between the class and the features may change in the process, thus exhibiting concept drift. The proposed method creates new members of the ensemble from mini-batches of data as new data becomes available. The maximum ensemble size is fixed, but learning does not stop when this size is reached because the ensemble is updated on new data to ensure consistency with the current concept. We also explore the use of concept drift detection to trigger a mechanism to update the ensemble. We test our method on real and synthetic data with concept drift and compare it against batch-incremental and instance-incremental classification methods for data streams.