论文标题

LNMAP:通过潜在空间中的非线性映射偏离双语词典诱导中的同构假设

LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction Through Non-Linear Mapping in Latent Space

论文作者

Mohiuddin, Tasnim, Bari, M Saiful, Joty, Shafiq

论文摘要

双语词典诱导(BLI)的大多数成功和主要的方法都是基于映射的,在该方法中,在假设不同语言的单词嵌入空间的假设中,有线性映射函数表现出相似的几何结构(即近似同构)。但是,最近的一些研究批评了这一简化的假设,表明即使对于密切相关的语言,它也不成立。在这项工作中,我们提出了一种新颖的半监督方法,以学习BLI的跨语性单词嵌入。我们的模型独立于同构假设,并在两个独立训练的自动编码器的潜在空间中使用非线性映射。通过对15(15)个不同语言对的广泛实验(在两个方向上),其中包括两个不同数据集的资源丰富和低资源语言,我们证明我们的方法以一个良好的利润优于现有模型。消融研究表明,不同模型组件的重要性以及非线性映射的必要性。

Most of the successful and predominant methods for bilingual lexicon induction (BLI) are mapping-based, where a linear mapping function is learned with the assumption that the word embedding spaces of different languages exhibit similar geometric structures (i.e., approximately isomorphic). However, several recent studies have criticized this simplified assumption showing that it does not hold in general even for closely related languages. In this work, we propose a novel semi-supervised method to learn cross-lingual word embeddings for BLI. Our model is independent of the isomorphic assumption and uses nonlinear mapping in the latent space of two independently trained auto-encoders. Through extensive experiments on fifteen (15) different language pairs (in both directions) comprising resource-rich and low-resource languages from two different datasets, we demonstrate that our method outperforms existing models by a good margin. Ablation studies show the importance of different model components and the necessity of non-linear mapping.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源