矢量空间的层次结构：定向单词和图形嵌入

论文标题

矢量空间的层次结构：定向单词和图形嵌入

Hierarchies over Vector Space: Orienting Word and Graph Embeddings

论文作者

Guo, Xingzhi, Skiena, Steven

论文摘要

单词和图形嵌入在深度学习应用中广泛使用。我们提出了一个数据结构，该数据结构从无序的平面嵌入空间中捕获固有的层次结构属性，尤其是在实体对之间的方向感。受\ textit {Distributal generational}概念的启发，我们的算法通过按照实体功率的降序（例如单词频率）插入节点，从而构建了Arborescence（一个定向的生根树），指向每个实体与最接近最接近功能更强大的节点作为父母。我们在三个任务上评估了生成的树结构的性能：超诺关系发现，单词之间最不常见的 - 熟悉的（LCA）发现以及wikipedia页面链接恢复。在有名Wiki-Page Link恢复方面，我们的HyperNym和LCA发现平均达到8.98 \％和2.70 \％，而62.76 \％的准确性，两种链接恢复均高于基线。最后，我们研究了插入顺序，权力/相似性权衡和各种功率来源的效果，以优化父母选择。

Word and graph embeddings are widely used in deep learning applications. We present a data structure that captures inherent hierarchical properties from an unordered flat embedding space, particularly a sense of direction between pairs of entities. Inspired by the notion of \textit{distributional generality}, our algorithm constructs an arborescence (a directed rooted tree) by inserting nodes in descending order of entity power (e.g., word frequency), pointing each entity to the closest more powerful node as its parent. We evaluate the performance of the resulting tree structures on three tasks: hypernym relation discovery, least-common-ancestor (LCA) discovery among words, and Wikipedia page link recovery. We achieve average 8.98\% and 2.70\% for hypernym and LCA discovery across five languages and 62.76\% accuracy on directed Wiki-page link recovery, with both substantially above baselines. Finally, we investigate the effect of insertion order, the power/similarity trade-off and various power sources to optimize parent selection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题