论文标题
来自大型消息传递数据的单词emoji嵌入反映了表达图标的现实语义关联
Word-Emoji Embeddings from large scale Messaging Data reflect real-world Semantic Associations of Expressive Icons
论文作者
论文摘要
我们训练从Jodel在线社交网络获得的大规模消息数据中训练Word-emoji嵌入。我们的数据集包含超过4000万个句子,其中1100万句用Unicode 13.0标准表情符号列表的子集进行了注释。我们通过分析表情符号,表情符号和文本之间以及文本和表情符号之间的相关性,探索这种嵌入中包含的语义表情符号关联。我们的调查表明,在大规模消息数据上训练的单词emoji嵌入可以反映现实世界的语义关联。为了实现进一步的研究,我们将沿300个维度的jodel表情符号嵌入数据集(JEED1488)释放,其中包含1488个表情符号及其嵌入。
We train word-emoji embeddings on large scale messaging data obtained from the Jodel online social network. Our data set contains more than 40 million sentences, of which 11 million sentences are annotated with a subset of the Unicode 13.0 standard Emoji list. We explore semantic emoji associations contained in this embedding by analyzing associations between emojis, between emojis and text, and between text and emojis. Our investigations demonstrate anecdotally that word-emoji embeddings trained on large scale messaging data can reflect real-world semantic associations. To enable further research we release the Jodel Emoji Embedding Dataset (JEED1488) containing 1488 emojis and their embeddings along 300 dimensions.