论文标题
具有基于变压器的体系结构的Covid-19搜索引擎(CO-SE)
A COVID-19 Search Engine (CO-SE) with Transformer-based Architecture
论文作者
论文摘要
冠状病毒病(Covid-19)是一种传染病,是由SARS-COV-2病毒引起的。由于关于Covid-19的文献不断增长,因此很难获得有关该病毒的精确,最新信息。从业人员,前线工人和研究人员需要专家特定的方法,以保持科学知识和研究结果的最新状态。但是,关于该主题的研究论文有很多研究论文,这使得很难跟上最新的研究。这个问题促使我们提出了COVID-19搜索引擎(CO-SE)的设计,该引擎是一个算法系统,可以找到每个查询的相关文档(用户询问),并通过搜索大量出版物来回答复杂的问题。 Co-SE具有在TF-IDF矢量器上训练的回收犬组件,该曲目从系统中检索相关文档。它还由一个读取器组件组成,该组件由基于变压器的模型组成,该模型用于读取段落并从检索到的文档中找到与查询有关的答案。所提出的模型的表现优于先前的模型,获得了71.45%的精确匹配比分数,语义答案相似性得分为78.55%。它还优于其他基准数据集,证明了所提出的方法的普遍性。
Coronavirus disease (COVID-19) is an infectious disease, which is caused by the SARS-CoV-2 virus. Due to the growing literature on COVID-19, it is hard to get precise, up-to-date information about the virus. Practitioners, front-line workers, and researchers require expert-specific methods to stay current on scientific knowledge and research findings. However, there are a lot of research papers being written on the subject, which makes it hard to keep up with the most recent research. This problem motivates us to propose the design of the COVID-19 Search Engine (CO-SE), which is an algorithmic system that finds relevant documents for each query (asked by a user) and answers complex questions by searching a large corpus of publications. The CO-SE has a retriever component trained on the TF-IDF vectorizer that retrieves the relevant documents from the system. It also consists of a reader component that consists of a Transformer-based model, which is used to read the paragraphs and find the answers related to the query from the retrieved documents. The proposed model has outperformed previous models, obtaining an exact match ratio score of 71.45% and a semantic answer similarity score of 78.55%. It also outperforms other benchmark datasets, demonstrating the generalizability of the proposed approach.