通过无监督的加权串联来学习元单词嵌入源嵌入

论文标题

通过无监督的加权串联来学习元单词嵌入源嵌入

Learning Meta Word Embeddings by Unsupervised Weighted Concatenation of Source Embeddings

论文作者

Bollegala, Danushka

论文摘要

给定多种源单词嵌入使用了使用多种算法和词汇资源学习的，元单词嵌入学习方法试图学习更准确和宽覆盖的单词嵌入。先前关于元装置的工作反复发现，简单的媒介嵌入嵌入是竞争性的基线。但是，尚不清楚为什么以及何时何时简单的矢量串联可以产生准确的元嵌入。我们表明，加权串联可以看作是每个源嵌入和元嵌入之间的频谱匹配操作，从而最大程度地减少了成对的内部产物损失。经过理论分析，我们提出了两种\ emph {norsubersevised}方法，以学习从给定的一组源嵌入的最佳串联权重以创建元嵌入的最佳串联权重。多个基准数据集的实验结果表明，所提出的加权串联元装置方法的表现优于先前提出的元装置学习方法。

Given multiple source word embeddings learnt using diverse algorithms and lexical resources, meta word embedding learning methods attempt to learn more accurate and wide-coverage word embeddings. Prior work on meta-embedding has repeatedly discovered that simple vector concatenation of the source embeddings to be a competitive baseline. However, it remains unclear as to why and when simple vector concatenation can produce accurate meta-embeddings. We show that weighted concatenation can be seen as a spectrum matching operation between each source embedding and the meta-embedding, minimising the pairwise inner-product loss. Following this theoretical analysis, we propose two \emph{unsupervised} methods to learn the optimal concatenation weights for creating meta-embeddings from a given set of source embeddings. Experimental results on multiple benchmark datasets show that the proposed weighted concatenated meta-embedding methods outperform previously proposed meta-embedding learning methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题