论文标题
单变量数据的无监督分类方法的基准和应用
Benchmark and application of unsupervised classification approaches for univariate data
论文作者
论文摘要
无监督的机器学习,特别是数据聚类,是分析数据集和识别在数据集中发生的特征特征的强大方法。它正在跨科学学科越来越受欢迎,对于在没有数据结构知识的情况下的应用中特别有用。在这里,我们介绍了一种无监督的数据分类的方法,该数据集由一系列单变量测量组成。因此,它非常适合多种测量类型。我们将其应用于纳米电子和光谱的领域,以识别数据集中的有意义的结构。我们还提供了估计最佳簇数量的准则。此外,我们对新颖和现有的机器学习方法进行了广泛的基准,并观察到了显着的性能差异。因此,仔细选择特定测量类型的特征空间构建方法和聚类算法可以极大地提高分类精度。
Unsupervised machine learning, and in particular data clustering, is a powerful approach for the analysis of datasets and identification of characteristic features occurring throughout a dataset. It is gaining popularity across scientific disciplines and is particularly useful for applications without a priori knowledge of the data structure. Here, we introduce an approach for unsupervised data classification of any dataset consisting of a series of univariate measurements. It is therefore ideally suited for a wide range of measurement types. We apply it to the field of nanoelectronics and spectroscopy to identify meaningful structures in data sets. We also provide guidelines for the estimation of the optimum number of clusters. In addition, we have performed an extensive benchmark of novel and existing machine learning approaches and observe significant performance differences. Careful selection of the feature space construction method and clustering algorithms for a specific measurement type can therefore greatly improve classification accuracies.