论文标题
半监督文本分类的渐进类语义匹配
Progressive Class Semantic Matching for Semi-supervised Text Classification
论文作者
论文摘要
半监督学习是减少文本分类的注释成本的一种有希望的方法。结合预先训练的语言模型(PLM),例如Bert,最近的半监督学习方法实现了令人印象深刻的表现。在这项工作中,我们进一步研究了半监督学习与预训练的语言模型之间的婚姻。与仅用于模型参数初始化的现有方法不同,我们探讨了PLMS内部固有的主题匹配能力,用于构建更强大的半监督学习方法。具体而言,我们提出了一个联合半监督学习过程,该过程可以逐步构建标准的$ K $ - 道路分类器和输入文本和类语义表示(CSR)的匹配网络。 CSR将从给定标记的句子中初始化,并通过培训过程逐步更新。通过广泛的实验,我们表明我们的方法不仅可以为基准带来显着的改进,而且总体上更稳定,并且可以在半监视文本分类中实现最先进的表现。
Semi-supervised learning is a promising way to reduce the annotation cost for text-classification. Combining with pre-trained language models (PLMs), e.g., BERT, recent semi-supervised learning methods achieved impressive performance. In this work, we further investigate the marriage between semi-supervised learning and a pre-trained language model. Unlike existing approaches that utilize PLMs only for model parameter initialization, we explore the inherent topic matching capability inside PLMs for building a more powerful semi-supervised learning approach. Specifically, we propose a joint semi-supervised learning process that can progressively build a standard $K$-way classifier and a matching network for the input text and the Class Semantic Representation (CSR). The CSR will be initialized from the given labeled sentences and progressively updated through the training process. By means of extensive experiments, we show that our method can not only bring remarkable improvement to baselines, but also overall be more stable, and achieves state-of-the-art performance in semi-supervised text classification.