论文标题

类比减去类比测试:测量单词嵌入中的规律性

Analogies minus analogy test: measuring regularities in word embeddings

论文作者

Fournier, Louis, Dupoux, Emmanuel, Dunbar, Ewan

论文摘要

长期以来,单词的矢量空间模型一直声称是将语言规律捕获为简单的矢量翻译,但对此主张提出了问题。我们分解和经验分析了经典的算术单词类比测试,以激励两个新的指标来解决标准测试的问题,并区分阶级偏移浓度(在阶级偏移量之间的相似方向)(从不同的范围中绘制的一对单词之间的相似方向,例如法国 - 法国 - - 洛恩 - - 洛恩 - 中国 - ottawa,...法国:巴黎::中国:北京)。我们表明,尽管标准类比测验存在缺陷,但几个流行的单词嵌入确实编码语言规律性。

Vector space models of words have long been claimed to capture linguistic regularities as simple vector translations, but problems have been raised with this claim. We decompose and empirically analyze the classic arithmetic word analogy test, to motivate two new metrics that address the issues with the standard test, and which distinguish between class-wise offset concentration (similar directions between pairs of words drawn from different broad classes, such as France--London, China--Ottawa, ...) and pairing consistency (the existence of a regular transformation between correctly-matched pairs such as France:Paris::China:Beijing). We show that, while the standard analogy test is flawed, several popular word embeddings do nevertheless encode linguistic regularities.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源