关键字与本地向量空间中所有单词的平均值相去甚远

论文标题

关键字与本地向量空间中所有单词的平均值相去甚远

Keywords lie far from the mean of all words in local vector space

论文作者

Papagiannopoulou, Eirini, Tsoumakas, Grigorios, Papadopoulos, Apostolos N.

论文摘要

关键字提取是一个重要的文档过程，旨在找到一小部分术语，这些术语简明地描述了文档的主题。最流行的最先进的无监督方法属于基于图的方法的家族，该方法构建了单词图并采用各种集中度度量来评分节点（候选关键字）。在这项工作中，我们遵循不同的途径，通过使用本地单词向量表示形式对文档单词的主要分布进行建模，从而从文本文档中检测关键字。然后，我们根据候选者在文本中的位置以及相应的本地向量与主分布中心之间的距离进行排名。通过扩展的实验研究，我们与强大的基线和最先进的无监督关键字提取方法相比，我们的方法的高性能证实了我们的方法。

Keyword extraction is an important document process that aims at finding a small set of terms that concisely describe a document's topics. The most popular state-of-the-art unsupervised approaches belong to the family of the graph-based methods that build a graph-of-words and use various centrality measures to score the nodes (candidate keywords). In this work, we follow a different path to detect the keywords from a text document by modeling the main distribution of the document's words using local word vector representations. Then, we rank the candidates based on their position in the text and the distance between the corresponding local vectors and the main distribution's center. We confirm the high performance of our approach compared to strong baselines and state-of-the-art unsupervised keyword extraction methods, through an extended experimental study, investigating the properties of the local representations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题