论文标题
使用迭代代表性多样性最大化(IRDM),基于池的无监督的主动学习进行回归
Pool-Based Unsupervised Active Learning for Regression Using Iterative Representativeness-Diversity Maximization (iRDM)
论文作者
论文摘要
主动学习(AL)选择标记最有益的未标记样本,因此可以从相同数量的标记样品中训练更好的机器学习模型。监督大多数现有的回归(ALR)方法的主动学习(ALR)方法,这意味着采样过程必须使用一些标签信息或现有的回归模型。本文考虑了完全无监督的ALR,即如何在不知道任何真正的标签信息的情况下选择样品标记。我们提出了一种新颖的无监督ALR方法,迭代代表性多样性最大化(IRDM),以最佳平衡所选样品的代表性和多样性。来自各个领域的12个数据集的实验证明了其有效性。我们的IRDM可以应用于线性回归和内核回归,当标记样品数量较小时,它甚至显着优于监督ALR。
Active learning (AL) selects the most beneficial unlabeled samples to label, and hence a better machine learning model can be trained from the same number of labeled samples. Most existing active learning for regression (ALR) approaches are supervised, which means the sampling process must use some label information, or an existing regression model. This paper considers completely unsupervised ALR, i.e., how to select the samples to label without knowing any true label information. We propose a novel unsupervised ALR approach, iterative representativeness-diversity maximization (iRDM), to optimally balance the representativeness and the diversity of the selected samples. Experiments on 12 datasets from various domains demonstrated its effectiveness. Our iRDM can be applied to both linear regression and kernel regression, and it even significantly outperforms supervised ALR when the number of labeled samples is small.