论文标题
基于上下文的虚拟对抗培训,用于带有嘈杂标签的文本分类
Context-based Virtual Adversarial Training for Text Classification with Noisy Labels
论文作者
论文摘要
深度神经网络(DNNS)具有很高的能力,可以在足够的训练时间内完全记住嘈杂的标签,并且不幸的是,其记忆会导致性能下降。最近,虚拟对抗训练(VAT)引起了人们的注意,因为它可以进一步改善半监督学习中DNN的概括。增值税背后的驱动力是通过通过在输入和扰动输入之间执行一致性来防止模型过度拟合数据点。如果该策略阻止神经模型学习嘈杂的样本,同时鼓励模型概括清洁样品,则该策略可能有助于从嘈杂的标签中学习。在本文中,我们提出了基于上下文的虚拟对抗训练(Convat),以防止文本分类器过度拟合到嘈杂的标签。与以前的作品不同,所提出的方法在上下文级别而不是输入进行对抗训练。它使分类器不仅可以学习其标签,还可以通过上下文邻居学习,从而通过在每个数据点上保留上下文语义来减轻嘈杂标签的学习。我们在四个具有两种标签噪声的文本分类数据集上进行了广泛的实验。全面的实验结果清楚地表明,即使在极其嘈杂的环境中,提出的方法也可以很好地工作。
Deep neural networks (DNNs) have a high capacity to completely memorize noisy labels given sufficient training time, and its memorization, unfortunately, leads to performance degradation. Recently, virtual adversarial training (VAT) attracts attention as it could further improve the generalization of DNNs in semi-supervised learning. The driving force behind VAT is to prevent the models from overfitting data points by enforcing consistency between the inputs and the perturbed inputs. This strategy could be helpful in learning from noisy labels if it prevents neural models from learning noisy samples while encouraging the models to generalize clean samples. In this paper, we propose context-based virtual adversarial training (ConVAT) to prevent a text classifier from overfitting to noisy labels. Unlike the previous works, the proposed method performs the adversarial training at the context level rather than the inputs. It makes the classifier not only learn its label but also its contextual neighbors, which alleviates the learning from noisy labels by preserving contextual semantics on each data point. We conduct extensive experiments on four text classification datasets with two types of label noises. Comprehensive experimental results clearly show that the proposed method works quite well even with extremely noisy settings.