论文标题
使用健康领域的应用程序,用于现实的大规模个人数据生成的仿真框架
Simulation Framework for Realistic Large-scale Individual-level Data Generation with an Application in the Health Domain
论文作者
论文摘要
我们提出了一个框架,用于对复杂系统的现实数据生成和模拟,并证明其在健康领域中的功能。该框架的主要用例正在预测危险因素和疾病发生的发展,评估干预措施和政策决策的影响以及统计方法开发。我们使用严格的数学定义介绍了框架的基本原理。该框架支持对真实人群以及各种操纵和数据收集过程的校准。 R中免费可用的开源实现可以包含有效的数据结构,并行计算和快速的随机数生成,以确保可重复性和可扩展性。通过该框架,可以在数十年的模拟时间内对数百万个人的人群进行日常级别的模拟。中风出现的一个例子,2型糖尿病和死亡率说明了在芬兰背景下的框架。在示例中,我们通过研究非参与对控制额外盐摄入量相关的估计风险模型和干预措施的影响来证明数据收集功能。
We propose a framework for realistic data generation and simulation of complex systems and demonstrate its capabilities in the health domain. The main use cases of the framework are predicting the development of risk factors and disease occurrence, evaluating the impact of interventions and policy decisions, and statistical method development. We present the fundamentals of the framework using rigorous mathematical definitions. The framework supports calibration to a real population as well as various manipulations and data collection processes. The freely available open-source implementation in R embraces efficient data structures, parallel computing and fast random number generation which ensure reproducibility and scalability. With the framework it is possible to run daily-level simulations for populations of millions of individuals for decades of simulated time. An example on the occurrence of stroke, type 2 diabetes and mortality illustrates the usage of the framework in the Finnish context. In the example, we demonstrate the data-collection functionality by studying the impact of non-participation on the estimated risk models and interventions related to controlling the additional salt intake.