基于知识图的强化路径推理的可解释疾病预测

论文标题

基于知识图的强化路径推理的可解释疾病预测

Interpretable Disease Prediction based on Reinforcement Path Reasoning over Knowledge Graphs

论文作者

Sun, Zhoujian, Dong, Wei, Shi, Jinlong, Huang, Zhengxing

论文摘要

目的：结合医学知识和医学数据以解释疾病的风险。方法：我们将疾病预测任务制定为沿着知识图（kg）的随机步行。具体来说，我们建立了一个公园，以根据验证的医学知识记录疾病与风险因素之间的关系。然后，一个数学对象沿着kg行走。它开始在患者实体上行走，该实体根据患者当前的疾病或危险因素连接KG，并停止在疾病实体上，这代表了预测的疾病。物体产生的轨迹代表了给定患者的可解释的疾病进展路径。对象的动态由基于策略的增强学习（RL）模块控制，该模块通过电子健康记录（EHRS）培训。实验：我们利用两个现实世界EHR数据集评估了模型的性能。在疾病预测任务中，我们的模型在曲线下（AUC）的宏区域（AUC）分别预测两个数据集中的53种循环系统疾病时，达到0.743和0.639。这种性能与医学研究中常用的机器学习（ML）模型相媲美。在定性分析中，我们的临床合作者回顾了我们的模型产生的疾病进展路径，并提倡其可解释性和可靠性。结论：实验结果验证了所提出的模型，可以解释评估和优化疾病预测。意义：我们的工作有助于利用医学知识和医疗数据共同用于可解释的预测任务。

Objective: To combine medical knowledge and medical data to interpretably predict the risk of disease. Methods: We formulated the disease prediction task as a random walk along a knowledge graph (KG). Specifically, we build a KG to record relationships between diseases and risk factors according to validated medical knowledge. Then, a mathematical object walks along the KG. It starts walking at a patient entity, which connects the KG based on the patient current diseases or risk factors and stops at a disease entity, which represents the predicted disease. The trajectory generated by the object represents an interpretable disease progression path of the given patient. The dynamics of the object are controlled by a policy-based reinforcement learning (RL) module, which is trained by electronic health records (EHRs). Experiments: We utilized two real-world EHR datasets to evaluate the performance of our model. In the disease prediction task, our model achieves 0.743 and 0.639 in terms of macro area under the curve (AUC) in predicting 53 circulation system diseases in the two datasets, respectively. This performance is comparable to the commonly used machine learning (ML) models in medical research. In qualitative analysis, our clinical collaborator reviewed the disease progression paths generated by our model and advocated their interpretability and reliability. Conclusion: Experimental results validate the proposed model in interpretably evaluating and optimizing disease prediction. Significance: Our work contributes to leveraging the potential of medical knowledge and medical data jointly for interpretable prediction tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题