论文标题

基于内容的文献推荐系统的上下文文档相似性

Contextual Document Similarity for Content-based Literature Recommender Systems

论文作者

Ostendorff, Malte

论文摘要

为了应对不断增长的信息过载,越来越多的数字库采用基于内容的推荐系统。传统上,这些系统在相似性措施的帮助下建议相关文档。但是,当前的文档相似性度量只是区分相似文档和不同的文档。这种简化对于广泛的文档尤为重要,这些文档涵盖了一个主题的各个方面,并且经常在数字图书馆中找到。尽管如此,这些相似性措施忽略了相似性的相关性。因此,相似性的背景仍然不明显。在本博士学位论文中,我们探讨了上下文文档的相似性度量,即确定文档相似性作为两个文档的三倍的方法及其相似性的上下文。上下文是相似性的进一步规范。例如,在科学领域,研究论文在其背景,方法或发现方面可能相似。在一个或多个给定上下文中的相似性测量将增强推荐系统。也就是说,用户将能够通过根据文档及其上下文相似性来探索文档收集。因此,我们的研究目标是基于上下文相似性对推荐系统的开发和评估。基础技术将采用既定的相似性措施以及神经方法,同时利用从文档及其文本之间的链接获得的语义特征。

To cope with the ever-growing information overload, an increasing number of digital libraries employ content-based recommender systems. These systems traditionally recommend related documents with the help of similarity measures. However, current document similarity measures simply distinguish between similar and dissimilar documents. This simplification is especially crucial for extensive documents, which cover various facets of a topic and are often found in digital libraries. Still, these similarity measures neglect to what facet the similarity relates. Therefore, the context of the similarity remains ill-defined. In this doctoral thesis, we explore contextual document similarity measures, i.e., methods that determine document similarity as a triple of two documents and the context of their similarity. The context is here a further specification of the similarity. For example, in the scientific domain, research papers can be similar with respect to their background, methodology, or findings. The measurement of similarity in regards to one or more given contexts will enhance recommender systems. Namely, users will be able to explore document collections by formulating queries in terms of documents and their contextual similarities. Thus, our research objective is the development and evaluation of a recommender system based on contextual similarity. The underlying techniques will apply established similarity measures and as well as neural approaches while utilizing semantic features obtained from links between documents and their text.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源