生物医学多跳的问题使用知识图嵌入和语言模型回答

论文标题

生物医学多跳的问题使用知识图嵌入和语言模型回答

Biomedical Multi-hop Question Answering Using Knowledge Graph Embeddings and Language Models

论文作者

Rao, Dattaraj J., Mane, Shraddha S., Paliwal, Mukta A.

论文摘要

生物医学知识图（kg）是由生物实体组成的异质网络，作为节点及其之间的关系。这些实体和关系是从数百万的研究论文中提取的，并在单个资源中统一。生物医学多跳的问题对知识图（KGQA）的目标是帮助生物学家和科学家通过以自然语言提出问题来获得宝贵的见解。可以通过首先理解问题，然后查询KG是否正确的节点和关系以得出答案来找到相关答案。为了模拟这个问题，罗伯塔（Roberta）和biobert等语言模型用于从自然语言问题中理解上下文。 KGQA中的挑战之一是KG缺少链接。知识图嵌入（KGE）通过以密集和更有效的方式编码节点和边缘来帮助克服此问题。在本文中，我们使用了一个称为Hetionet的公开可用的KG，它是一个由29个不同基因，化合物，疾病等不同数据库组成的生物医学知识的集成网络。我们已经通过自然语言创建多跳生物医学问答数据集来丰富了此KG数据集，以测试生物医学多跳的问题避开系统，并且该数据集将提供给研究社区。这项研究的主要贡献是一个集成的系统，该系统将语言模型与KG嵌入结合在一起，以对直觉界面中生物学家提出的自由形式问题提供高度相关的答案。在此数据上测试了生物医学多跳的问题索问题系统，结果令人鼓舞。

Biomedical knowledge graphs (KG) are heterogenous networks consisting of biological entities as nodes and relations between them as edges. These entities and relations are extracted from millions of research papers and unified in a single resource. The goal of biomedical multi-hop question-answering over knowledge graph (KGQA) is to help biologist and scientist to get valuable insights by asking questions in natural language. Relevant answers can be found by first understanding the question and then querying the KG for right set of nodes and relationships to arrive at an answer. To model the question, language models such as RoBERTa and BioBERT are used to understand context from natural language question. One of the challenges in KGQA is missing links in the KG. Knowledge graph embeddings (KGE) help to overcome this problem by encoding nodes and edges in a dense and more efficient way. In this paper, we use a publicly available KG called Hetionet which is an integrative network of biomedical knowledge assembled from 29 different databases of genes, compounds, diseases, and more. We have enriched this KG dataset by creating a multi-hop biomedical question-answering dataset in natural language for testing the biomedical multi-hop question-answering system and this dataset will be made available to the research community. The major contribution of this research is an integrated system that combines language models with KG embeddings to give highly relevant answers to free-form questions asked by biologists in an intuitive interface. Biomedical multi-hop question-answering system is tested on this data and results are highly encouraging.

下载PDF全文

下载文献需遵守相关版权规定

论文标题