Sciev：寻找科学新闻的科学证据论文

论文标题

Sciev：寻找科学新闻的科学证据论文

SciEv: Finding Scientific Evidence Papers for Scientific News

论文作者

Hoque, Md Reshad Ul, Li, Jiang, Wu, Jian

论文摘要

在过去的十年中，许多科学新闻媒体报告了科学的突破和发现，使科学技术更接近公众。但是，并非所有科学新闻文章都引用了适当的资料，例如原始科学论文。一部分科学新闻文章包含误解，夸张或扭曲的信息，这些信息偏离了原始论文中断言的事实。手动识别适当的引用是费力且昂贵的。因此，有必要自动寻找可以用作给定科学新闻的证据的相关科学论文。我们提出了一个名为Sciev的系统，该系统搜索了科学新闻文章的科学证据论文。该系统采用了一个2阶段的查询范式，其中第一阶段检索了候选论文，并将其重新列出了第二阶段。 SCIEV的关键特征是它使用域知识实体（DKE）在第一阶段找到候选人，事实证明，这比常规键形词更有效。在重读阶段，我们探讨了新闻文章和候选论文的不同文档表示形式。为了评估我们的系统，我们编制了一个试点数据集，该数据集由Sciencealert和类似网站的100个手动策划（新闻，纸）对组成。据我们所知，这是此类数据集的第一个数据集。我们的实验表明，变压器模型对DKE提取表现最佳。该系统在使用基于TFIDF的文本表示时，系统达到P@1 = 50％，P@5 = 71％，P@10 = 74％。基于变压器的重建器的性能可比性，但花费的时间是两倍。我们将收集更多数据并测试系统以获得用户体验。

In the past decade, many scientific news media that report scientific breakthroughs and discoveries emerged, bringing science and technology closer to the general public. However, not all scientific news article cites proper sources, such as original scientific papers. A portion of scientific news articles contain misinterpreted, exaggerated, or distorted information that deviates from facts asserted in the original papers. Manually identifying proper citations is laborious and costly. Therefore, it is necessary to automatically search for pertinent scientific papers that could be used as evidence for a given piece of scientific news. We propose a system called SciEv that searches for scientific evidence papers given a scientific news article. The system employs a 2-stage query paradigm with the first stage retrieving candidate papers and the second stage reranking them. The key feature of SciEv is it uses domain knowledge entities (DKEs) to find candidates in the first stage, which proved to be more effective than regular keyphrases. In the reranking stage, we explore different document representations for news articles and candidate papers. To evaluate our system, we compiled a pilot dataset consisting of 100 manually curated (news,paper) pairs from ScienceAlert and similar websites. To our best knowledge, this is the first dataset of this kind. Our experiments indicate that the transformer model performs the best for DKE extraction. The system achieves a P@1=50%, P@5=71%, and P@10=74% when it uses a TFIDF-based text representation. The transformer-based re-ranker achieves a comparable performance but costs twice as much time. We will collect more data and test the system for user experience.

下载PDF全文

下载文献需遵守相关版权规定

论文标题