关于样品在主动学习中的可重复性

论文标题

关于样品在主动学习中的可重复性

On the reusability of samples in active learning

论文作者

van Tulder, Gijs, Loog, Marco

论文摘要

一个有趣但没有广泛研究的活动中的问题是样本可重复使用性：一个学习者在多大程度上可以由另一个学习者重复使用？本文解释了为什么样本可重复使用性具有实际兴趣，为什么重复使用可以是一个问题，如何通过重要性加权的积极学习来改善可重复使用性以及哪些普遍可重复使用性的障碍仍然存在。通过理论论点和实际演示，本文认为普遍的可重复性是不可能的。因为每个活跃的学习策略都必须调解样本空间的某些领域，所以依赖于这些领域样本的学习者将从随机的样本选择中学习更多。本文描述了一些具有重要性加权的活跃学习的实验，这些实验表明了可重复性问题在实践中的影响。该实验证实了普遍的可重复使用性不存在，尽管在某些情况下 - 在某些数据集和某些分类器上 - 有样本可重用性。最后，本文探讨了可以保证两个分类器之间可重复使用性的条件。

An interesting but not extensively studied question in active learning is that of sample reusability: to what extent can samples selected for one learner be reused by another? This paper explains why sample reusability is of practical interest, why reusability can be a problem, how reusability could be improved by importance-weighted active learning, and which obstacles to universal reusability remain. With theoretical arguments and practical demonstrations, this paper argues that universal reusability is impossible. Because every active learning strategy must undersample some areas of the sample space, learners that depend on the samples in those areas will learn more from a random sample selection. This paper describes several experiments with importance-weighted active learning that show the impact of the reusability problem in practice. The experiments confirmed that universal reusability does not exist, although in some cases -- on some datasets and with some pairs of classifiers -- there is sample reusability. Finally, this paper explores the conditions that could guarantee the reusability between two classifiers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题