论文标题
不同的审查和承诺:临床机器学习中标签偏见的来源
Disparate Censorship & Undertesting: A Source of Label Bias in Clinical Machine Learning
论文作者
论文摘要
随着机器学习(ML)模型在临床应用中获得吸引力,了解临床医生和社会偏见对ML模型的影响越来越重要。尽管用于模型训练的标签可能会出现偏见,但这些偏见的许多来源尚未得到充分研究。在本文中,我们重点介绍了不同的审查制度(即,患者组的测试率差异)是临床ML模型可能会放大的标签偏差来源,可能造成损害。许多患者风险分层模型都使用标签的临床医生诊断和实验室测试的结果进行培训。没有测试结果的患者通常被分配为负标签,该标签假设未经测试的患者没有经历结果。由于订单受到临床和资源考虑因素的影响,因此在患者人群中进行测试可能并不统一,从而导致不同的审查制度。同等风险患者的不同审查制度会导致某些组的承诺,进而对此类组的有偏见的标签进行审查。在标准ML管道中使用此类偏见的标签可能会导致患者组的模型性能差距。在这里,我们从理论和经验上表征了不同的审查制度或承诺影响模型性能的条件。我们的发现呼吁人们注意不同的审查制度,作为临床ML模型中标签偏差的来源。
As machine learning (ML) models gain traction in clinical applications, understanding the impact of clinician and societal biases on ML models is increasingly important. While biases can arise in the labels used for model training, the many sources from which these biases arise are not yet well-studied. In this paper, we highlight disparate censorship (i.e., differences in testing rates across patient groups) as a source of label bias that clinical ML models may amplify, potentially causing harm. Many patient risk-stratification models are trained using the results of clinician-ordered diagnostic and laboratory tests of labels. Patients without test results are often assigned a negative label, which assumes that untested patients do not experience the outcome. Since orders are affected by clinical and resource considerations, testing may not be uniform in patient populations, giving rise to disparate censorship. Disparate censorship in patients of equivalent risk leads to undertesting in certain groups, and in turn, more biased labels for such groups. Using such biased labels in standard ML pipelines could contribute to gaps in model performance across patient groups. Here, we theoretically and empirically characterize conditions in which disparate censorship or undertesting affect model performance across subgroups. Our findings call attention to disparate censorship as a source of label bias in clinical ML models.