论文标题

深度学习应用于胸部X射线:开发和防止快捷方式

Deep Learning Applied to Chest X-Rays: Exploiting and Preventing Shortcuts

论文作者

Jabbour, Sarah, Fouhey, David, Kazerooni, Ella, Sjoding, Michael W., Wiens, Jenna

论文摘要

尽管深度学习已经显示出基于胸部X射线的自动诊断的有望,但深网可能表现出与捷径相关的不良行为。本文研究了一个虚假阶级的情况,其中具有特定属性的患者更有可能具有感兴趣的结果。例如,临床方案可能会导致一个数据集,其中起搏器患者可能患有充血性心力衰竭。这种偏斜可能会导致模型通过严重依赖偏见的属性来缩短捷径。在诊断出急性低氧呼吸衰竭原因的背景下,我们探讨了许多属性。应用于胸部X射线,我们表明i)深网可以准确地确定许多患者属性(AUROC = 0.96)和年龄(AUROC> = 0.90),ii)它们倾向于利用此类属性和结果标签之间的相关性,而在学习诊断方面的诊断范围不足时,所有人都无法进行测试人群(E. e.G)。令人惊讶地有效地防止快捷方式和促进良好的概括性能。在诊断基于一组偏向老年患者的胸部X射线的任务(年龄> = 63)时,提出的方法将对标准培训的概括从0.66(95%CI:0.54-0.77)提高到0.84(95%CI:0.73-0.92)AUROC。虽然简单,但提出的方法有可能通过鼓励依赖临床相关疾病的表现,即临床医生将用来诊断的疾病的临床相关表现,从而提高跨种群模型的性能。

While deep learning has shown promise in improving the automated diagnosis of disease based on chest X-rays, deep networks may exhibit undesirable behavior related to shortcuts. This paper studies the case of spurious class skew in which patients with a particular attribute are spuriously more likely to have the outcome of interest. For instance, clinical protocols might lead to a dataset in which patients with pacemakers are disproportionately likely to have congestive heart failure. This skew can lead to models that take shortcuts by heavily relying on the biased attribute. We explore this problem across a number of attributes in the context of diagnosing the cause of acute hypoxemic respiratory failure. Applied to chest X-rays, we show that i) deep nets can accurately identify many patient attributes including sex (AUROC = 0.96) and age (AUROC >= 0.90), ii) they tend to exploit correlations between such attributes and the outcome label when learning to predict a diagnosis, leading to poor performance when such correlations do not hold in the test population (e.g., everyone in the test set is male), and iii) a simple transfer learning approach is surprisingly effective at preventing the shortcut and promoting good generalization performance. On the task of diagnosing congestive heart failure based on a set of chest X-rays skewed towards older patients (age >= 63), the proposed approach improves generalization over standard training from 0.66 (95% CI: 0.54-0.77) to 0.84 (95% CI: 0.73-0.92) AUROC. While simple, the proposed approach has the potential to improve the performance of models across populations by encouraging reliance on clinically relevant manifestations of disease, i.e., those that a clinician would use to make a diagnosis.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源