论文标题
多目标半监督聚类用于查找预测群集
Multi-objective Semi-supervised Clustering for Finding Predictive Clusters
论文作者
论文摘要
这项研究集中于聚类问题,并旨在找到有关结果变量信息丰富的紧凑型簇。主要目标是分区数据点,以便每个群集中的观测值相似,并且可以同时使用这些簇来鉴定结果变量。我们将这个半监督的聚类问题建模为一个多目标优化问题,考虑群集中的数据点的偏差,并且结果变量的预测误差是要最小化的两个目标函数。为了找到最佳的聚类解决方案,我们采用了非主导的排序遗传算法II方法,并将局部回归作为输出变量的预测方法。为了比较提出的模型的性能,我们使用五个现实世界数据集计算七个模型。此外,我们研究了使用局部回归来预测所有模型中结果变量的影响,并检查与单目标模型相比,多目标模型的性能。
This study concentrates on clustering problems and aims to find compact clusters that are informative regarding the outcome variable. The main goal is partitioning data points so that observations in each cluster are similar and the outcome variable can be predicated using these clusters simultaneously. We model this semi-supervised clustering problem as a multi-objective optimization problem with considering deviation of data points in clusters and prediction error of the outcome variable as two objective functions to be minimized. For finding optimal clustering solutions, we employ a non-dominated sorting genetic algorithm II approach and local regression is applied as prediction method for the output variable. For comparing the performance of the proposed model, we compute seven models using five real-world data sets. Furthermore, we investigate the impact of using local regression for predicting the outcome variable in all models, and examine the performance of the multi-objective models compared to single-objective models.