针对学习：可重复研究的稳定统计数据

论文标题

针对学习：可重复研究的稳定统计数据

Targeting Learning: Robust Statistics for Reproducible Research

论文作者

Coyle, Jeremy R., Hejazi, Nima S., Malenica, Ivana, Phillips, Rachael V., Arnold, Benjamin F., Mertens, Andrew, Benjamin-Chung, Jade, Cai, Weixin, Dayal, Sonali, Colford Jr., John M., Hubbard, Alan E., van der Laan, Mark J.

论文摘要

有针对性的学习是统计的一个子领域，统一因果推理，机器学习和统计理论的进步，以帮助以统计信心回答科学影响的问题。有针对性的学习是由数据科学中的复杂问题驱动的，并且在各种现实世界情景中实施了：具有缺失治疗和结果的观察性研究，个性化干预措施，具有时间变化的治疗方案，生存分析，适应性随机试验，调解分析，调解分析和所关联受试者网络的纵向环境。与统治当前统计实践的限制性建模策略的（MIS）应用相反，有针对性的学习为统计估计和推理（即置信区间和P值）建立了原则上的标准。这种多重强大的方法伴随着指导路线图和新兴的软件生态系统，这两者均为优化的估算器的构建提供指导，以最佳地回答激励问题。有针对性学习的路线图强调定制统计程序，以最大程度地减少其假设，仅在可用的科学知识中仔细地将它们扎根。最终结果是一个诚实地反映背景知识和可用数据的不确定性，以便从统计分析中得出可靠的结论 - 最终增强了科学发现的可重复性和严格性。

Targeted Learning is a subfield of statistics that unifies advances in causal inference, machine learning and statistical theory to help answer scientifically impactful questions with statistical confidence. Targeted Learning is driven by complex problems in data science and has been implemented in a diversity of real-world scenarios: observational studies with missing treatments and outcomes, personalized interventions, longitudinal settings with time-varying treatment regimes, survival analysis, adaptive randomized trials, mediation analysis, and networks of connected subjects. In contrast to the (mis)application of restrictive modeling strategies that dominate the current practice of statistics, Targeted Learning establishes a principled standard for statistical estimation and inference (i.e., confidence intervals and p-values). This multiply robust approach is accompanied by a guiding roadmap and a burgeoning software ecosystem, both of which provide guidance on the construction of estimators optimized to best answer the motivating question. The roadmap of Targeted Learning emphasizes tailoring statistical procedures so as to minimize their assumptions, carefully grounding them only in the scientific knowledge available. The end result is a framework that honestly reflects the uncertainty in both the background knowledge and the available data in order to draw reliable conclusions from statistical analyses - ultimately enhancing the reproducibility and rigor of scientific findings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题