基于变压器的单词级别语言识别的模型，以代码混合的Kannada-English文本

论文标题

基于变压器的单词级别语言识别的模型，以代码混合的Kannada-English文本

Transformer-based Model for Word Level Language Identification in Code-mixed Kannada-English Texts

论文作者

Tonja, Atnafu Lambebo, Yigezu, Mesay Gemeda, Kolesnikova, Olga, Tash, Moein Shahiki, Sidorov, Grigori, Gelbuk, Alexander

论文摘要

目前，使用自然语言处理（NLP）研究中的代码混合数据引起了很多关注。由于社交媒体在交流中的进步和影响，社交媒体代码混合文本的语言识别一直是一个有趣的研究问题。本文介绍了Centro deInvestionciónEnComputación（CIC）团队的系统系统描述Coli-Kanglish共享任务的论文。在本文中，我们建议使用基于变压器的模型用于代码混合的Kannada英语文本中的单词级语言识别。 Coli-Kenglish数据集上的建议模型的加权F1分数为0.84，宏F1得分为0.61。

Using code-mixed data in natural language processing (NLP) research currently gets a lot of attention. Language identification of social media code-mixed text has been an interesting problem of study in recent years due to the advancement and influences of social media in communication. This paper presents the Instituto Politécnico Nacional, Centro de Investigación en Computación (CIC) team's system description paper for the CoLI-Kanglish shared task at ICON2022. In this paper, we propose the use of a Transformer based model for word-level language identification in code-mixed Kannada English texts. The proposed model on the CoLI-Kenglish dataset achieves a weighted F1-score of 0.84 and a macro F1-score of 0.61.

下载PDF全文

下载文献需遵守相关版权规定

论文标题