MixPul：基于一致性的积极和未标记学习的增强

论文标题

MixPul：基于一致性的积极和未标记学习的增强

MixPUL: Consistency-based Augmentation for Positive and Unlabeled Learning

论文作者

Wei, Tong, Shi, Feng, Wang, Hai, Li, Wei-Wei Tu. Yu-Feng

论文摘要

从正面和未标记的数据（PU学习）中学习，在只有几个示例的实用应用中很普遍。以前的PU学习研究通常依赖于现有样本，因此不会广泛探索数据分布。在这项工作中，我们提出了一种基于\ emph {一致性正则化}的简单而有效的数据增强方法，即〜\ algo，该方法提供了使用PU数据的新观点。特别是，提出的〜\ algo〜包含了受监督和无监督的一致性培训，以生成增强数据。为了促进监督的一致性，由于没有负样本，可靠的负面示例是从未标记的数据中开采的。在未标记的数据点之间进一步鼓励了无监督的一致性。另外，〜\ algo〜减少了正面和未标记对之间的边缘损失，这显式优化了AUC并产生更快的收敛性。最后，我们进行了一系列研究，以证明一致性正则化的有效性。我们检查了三种可靠的负面采矿方法。我们表明，〜\ algo〜在CIFAR-10数据集上，在不同的正数据量中，在CIFAR-10数据集上，分类误差的平均改进。

Learning from positive and unlabeled data (PU learning) is prevalent in practical applications where only a couple of examples are positively labeled. Previous PU learning studies typically rely on existing samples such that the data distribution is not extensively explored. In this work, we propose a simple yet effective data augmentation method, coined~\algo, based on \emph{consistency regularization} which provides a new perspective of using PU data. In particular, the proposed~\algo~incorporates supervised and unsupervised consistency training to generate augmented data. To facilitate supervised consistency, reliable negative examples are mined from unlabeled data due to the absence of negative samples. Unsupervised consistency is further encouraged between unlabeled datapoints. In addition,~\algo~reduces margin loss between positive and unlabeled pairs, which explicitly optimizes AUC and yields faster convergence. Finally, we conduct a series of studies to demonstrate the effectiveness of consistency regularization. We examined three kinds of reliable negative mining methods. We show that~\algo~achieves an averaged improvement of classification error from 16.49 to 13.09 on the CIFAR-10 dataset across different positive data amount.

下载PDF全文

下载文献需遵守相关版权规定

论文标题