论文标题

Mementoml:OpenML100数据集上选定的机器学习算法配置的性能

MementoML: Performance of selected machine learning algorithm configurations on OpenML100 datasets

论文作者

Kretowicz, Wojciech, Biecek, Przemysław

论文摘要

为机器学习算法寻找最佳的超参数通常可以显着提高其性能。但是如何以时间效率的方式选择它们?在本文中,我们介绍了生成基准数据的协议,描述了具有不同高参数配置的不同ML算法的性能。以这种方式收集的数据用于研究影响算法性能的因素。 该集合是为了在EPP研究中提出的研究目的准备的。我们测试了高参数密集网格上的算法性能。在任何算法运行且未更改之前,选择了经过测试的数据集和超参数。这是一种与高参数调整中通常使用的方法不同的方法,在该方法中,候选超参数的选择取决于先前获得的结果。但是,这种选择可以系统地分析各个超参数的性能敏感性。 这导致了我们想分享的这种基准的全面数据集。我们希望,计算和收集的结果可能对其他研究人员有帮助。本文描述了收集数据的方式。在这里,您可以在39个OpenML数据集上找到7种流行的机器学习算法的基准。 构成此基准的详细数据可在以下网址提供:https://www.kaggle.com/mi2datalab/mementoml。

Finding optimal hyperparameters for the machine learning algorithm can often significantly improve its performance. But how to choose them in a time-efficient way? In this paper we present the protocol of generating benchmark data describing the performance of different ML algorithms with different hyperparameter configurations. Data collected in this way is used to study the factors influencing the algorithm's performance. This collection was prepared for the purposes of the study presented in the EPP study. We tested algorithms performance on dense grid of hyperparameters. Tested datasets and hyperparameters were chosen before any algorithm has run and were not changed. This is a different approach than the one usually used in hyperparameter tuning, where the selection of candidate hyperparameters depends on the results obtained previously. However, such selection allows for systematic analysis of performance sensitivity from individual hyperparameters. This resulted in a comprehensive dataset of such benchmarks that we would like to share. We hope, that computed and collected result may be helpful for other researchers. This paper describes the way data was collected. Here you can find benchmarks of 7 popular machine learning algorithms on 39 OpenML datasets. The detailed data forming this benchmark are available at: https://www.kaggle.com/mi2datalab/mementoml.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源