论文标题

使用电子健康记录在选择偏见下使用电子健康记录的患者招募:两个阶段抽样框架

Patient Recruitment Using Electronic Health Records Under Selection Bias: a Two-phase Sampling Framework

论文作者

Zhang, Guanghao, Beesley, Lauren J., Mukherjee, Bhramar, Shi, Xu

论文摘要

电子健康记录(EHR)越来越被认为是临床研究中患者招募的具有成本效益的资源。但是,如何从数百万个人中最佳选择一个队列来回答一个科学的感兴趣问题尚不清楚。考虑一项研究以估计昂贵结果的平均值或平均差异。廉价的辅助协变量可以预测结果的健康记录,这是有选择性地招募患者的机会,从而提高了下游分析的效率。在本文中,我们提出了一种两阶段抽样设计,该设计利用EHR数据中的辅助协变量的可用信息。使用EHR数据进行多相抽样的关键挑战是潜在的选择偏差,因为EHR数据不一定代表目标人群。扩展了有关两阶段采样设计的现有文献,我们得出了一种最佳的两相抽样方法,该方法在考虑EHR数据中的潜在选择偏差的同时,提高了随机抽样的效率。我们通过模拟研究来证明我们的抽样设计的效率提高,并应用了我们在美国成年人中利用密歇根州基因组学计划的数据的应用,这是密歇根州医学的纵向生物验证。

Electronic health records (EHRs) are increasingly recognized as a cost-effective resource for patient recruitment in clinical research. However, how to optimally select a cohort from millions of individuals to answer a scientific question of interest remains unclear. Consider a study to estimate the mean or mean difference of an expensive outcome. Inexpensive auxiliary covariates predictive of the outcome may often be available in patients' health records, presenting an opportunity to recruit patients selectively which may improve efficiency in downstream analyses. In this paper, we propose a two-phase sampling design that leverages available information on auxiliary covariates in EHR data. A key challenge in using EHR data for multi-phase sampling is the potential selection bias, because EHR data are not necessarily representative of the target population. Extending existing literature on two-phase sampling design, we derive an optimal two-phase sampling method that improves efficiency over random sampling while accounting for the potential selection bias in EHR data. We demonstrate the efficiency gain from our sampling design via simulation studies and an application to evaluating the prevalence of hypertension among US adults leveraging data from the Michigan Genomics Initiative, a longitudinal biorepository in Michigan Medicine.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源