论文标题

情感分类器的跨语性转移

Cross-lingual Transfer of Sentiment Classifiers

论文作者

Robnik-Sikonja, Marko, Reba, Kristjan, Mozetic, Igor

论文摘要

单词嵌入表示数字空间中的单词,因此单词之间的语义关系表示为向量空间中的距离和方向。跨语性单词嵌入会改变不同语言的矢量空间,从而使相似的单词对齐。这是通过在两种语言的向量空间之间构造映射或学习多种语言的联合向量空间来完成的。跨语性嵌入可用于在语言之间传输机器学习模型,从而补偿不足以资源较低的语言的数据。我们使用跨语性单词嵌入来转移13种语言之间的Twitter情感的机器学习预测模型。我们专注于最近显示出卓越转移性能的两种转移机制。第一个机制使用训练有素的模型,其输入是激光库中实现的许多语言的关节数值空间。第二种机制使用了大型的多语言BERT语言模型。我们的实验表明,即使没有目标语言数据,模型之间的模型也是明智的。使用多语言BERT和激光库获得的跨语性模型的性能是可比的,并且差异依赖语言。仅在三种语言上预定的Crosloengual Bert的转移在这些语言和一些密切相关的语言上是优越的。

Word embeddings represent words in a numeric space so that semantic relations between words are represented as distances and directions in the vector space. Cross-lingual word embeddings transform vector spaces of different languages so that similar words are aligned. This is done by constructing a mapping between vector spaces of two languages or learning a joint vector space for multiple languages. Cross-lingual embeddings can be used to transfer machine learning models between languages, thereby compensating for insufficient data in less-resourced languages. We use cross-lingual word embeddings to transfer machine learning prediction models for Twitter sentiment between 13 languages. We focus on two transfer mechanisms that recently show superior transfer performance. The first mechanism uses the trained models whose input is the joint numerical space for many languages as implemented in the LASER library. The second mechanism uses large pretrained multilingual BERT language models. Our experiments show that the transfer of models between similar languages is sensible, even with no target language data. The performance of cross-lingual models obtained with the multilingual BERT and LASER library is comparable, and the differences are language-dependent. The transfer with CroSloEngual BERT, pretrained on only three languages, is superior on these and some closely related languages.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源