论文标题

为葡萄牙的句法和语义类比生成感官嵌入

Generating Sense Embeddings for Syntactic and Semantic Analogy for Portuguese

论文作者

da Silva, Jessica Rodrigues, Caseli, Helena de Medeiros

论文摘要

单词嵌入是数值向量,可以在低维连续空间中表示单词或概念。这些向量能够捕获有用的句法和语义信息。诸如Word2Vec,Glove和FastText之类的传统方法具有严格的缺点:它们每个单词都会产生单个向量表示,而忽略了歧义单词可以采用不同含义的事实。在本文中,我们使用技术来生成感官嵌入,并提出针对葡萄牙人进行的第一个实验。我们的实验表明,在句法和语义类比任务中,Sense Vectors优于传统单词向量,证明此处生成的语言资源可以改善葡萄牙语中NLP任务的性能。

Word embeddings are numerical vectors which can represent words or concepts in a low-dimensional continuous space. These vectors are able to capture useful syntactic and semantic information. The traditional approaches like Word2Vec, GloVe and FastText have a strict drawback: they produce a single vector representation per word ignoring the fact that ambiguous words can assume different meanings. In this paper we use techniques to generate sense embeddings and present the first experiments carried out for Portuguese. Our experiments show that sense vectors outperform traditional word vectors in syntactic and semantic analogy tasks, proving that the language resource generated here can improve the performance of NLP tasks in Portuguese.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源