论文标题

Lynyrdskynyrd在Wnut-2020任务2:半监督学习,以识别信息丰富的COVID-19英语推文

LynyrdSkynyrd at WNUT-2020 Task 2: Semi-Supervised Learning for Identification of Informative COVID-19 English Tweets

论文作者

Sancheti, Abhilasha, Chawla, Kushal, Verma, Gaurav

论文摘要

我们描述了我们针对WNUT-2020的系统共享任务,以识别信息丰富的Covid-19英语推文。我们的系统是各种机器学习方法的合奏,利用了传统的基于功能的分类器以及预训练的语言模型的最新进展,这些模型有助于捕获推文中的句法,语义和上下文特征。我们进一步采用伪标签来纳入大流行上发布的未标记的Twitter数据。我们最佳性能模型在提供的验证集上达到了0.9179的F1得分,而在盲测试集中达到了0.8805。

We describe our system for WNUT-2020 shared task on the identification of informative COVID-19 English tweets. Our system is an ensemble of various machine learning methods, leveraging both traditional feature-based classifiers as well as recent advances in pre-trained language models that help in capturing the syntactic, semantic, and contextual features from the tweets. We further employ pseudo-labelling to incorporate the unlabelled Twitter data released on the pandemic. Our best performing model achieves an F1-score of 0.9179 on the provided validation set and 0.8805 on the blind test-set.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源