论文标题

单词插头区分闪族中的人数和根源动词

Word-Embeddings Distinguish Denominal and Root-Derived Verbs in Semitic

论文作者

Benbaji, Ido, Doron, Omri, Hénot-Mortier, Adèle

论文摘要

分布式形态框架的支持者提出了形态形成两个层次的存在:一个较低的单词形成,导致输入输出语义关系松散;以及一个高层,导致了紧密的输入输出语义关系。在这项工作中,我们建议在希伯来语单词嵌入的背景下测试该假设的有效性。如果实现了两级假设,我们期望希伯来语单词嵌入的最先进的嵌入能够编码(1)名词,(2)源自它(通过上层操作)的代数(通过上层操作),以及(3)与名词相关的动词(通过名词的较低空间),该范围是在名词的范围内,与1个embs noune(相比)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)相关动词(3)是相同的名词(1)。我们报告说,这一假设通过希伯来语的四个嵌入模型来验证:FastText,Glove,Word2Vec和Alephbert。这表明单词嵌入模型能够捕获出于形态学动机的复杂而细粒的语义属性。

Proponents of the Distributed Morphology framework have posited the existence of two levels of morphological word formation: a lower one, leading to loose input-output semantic relationships; and an upper one, leading to tight input-output semantic relationships. In this work, we propose to test the validity of this assumption in the context of Hebrew word embeddings. If the two-level hypothesis is borne out, we expect state-of-the-art Hebrew word embeddings to encode (1) a noun, (2) a denominal derived from it (via an upper-level operation), and (3) a verb related to the noun (via a lower-level operation on the noun's root), in such a way that the denominal (2) should be closer in the embedding space to the noun (1) than the related verb (3) is to the same noun (1). We report that this hypothesis is verified by four embedding models of Hebrew: fastText, GloVe, Word2Vec and AlephBERT. This suggests that word embedding models are able to capture complex and fine-grained semantic properties that are morphologically motivated.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源