深度学习模型的不确定性估计方法有助于临床决策 - 临床医生的观点

论文标题

深度学习模型的不确定性估计方法有助于临床决策 - 临床医生的观点

Uncertainty estimations methods for a deep learning model to aid in clinical decision-making -- a clinician's perspective

论文作者

Dohopolski, Michael, Wang, Kai, Wang, Biling, Bai, Ti, Nguyen, Dan, Sher, David, Jiang, Steve, Wang, Jing

论文摘要

预测不确定性估计具有临床意义，因为它可以潜在地量化预测可靠性。如果有可靠的可靠性信息，临床医生可能会更信任“ BlackBox”模型，这可能会导致更多模型被采用到临床实践中。有几种深度学习启发的不确定性估计技术，但是在医疗数据集中很少实施 - 在单个机构数据集/模型上更少。我们试图比较辍学变量推理（DO），测试时间增强（TTA），共形预测和单个确定性方法，用于使用我们的模型培训的模型来估算不确定性，以预测用辐射治疗的271例头颈癌患者的进食管的放置。我们比较了曲线下的面积（AUC），灵敏度，特异性，预测值（PPV）和负预测值（NPV）趋势的各种截止方法，这些临界值试图将患者分类为“某些”和“不确定”的同伙。这些截止是通过计算验证队列中的百分位“不确定性”并应用于测试队列中获得的。从广义上讲，随着预测更加“确定”（即较低的不确定性估计），AUC，灵敏度和NPV增加。但是，当使用多数投票（实施2/3标准：DO，TTA，共形预测）或更严格的方法（3/3标准）时，AUC，敏感性和NPV改善而没有明显的特异性或PPV损失。特别是对于较小的单个机构数据集，在将模型纳入临床实践中之前，评估多个估计技术可能很重要。

Prediction uncertainty estimation has clinical significance as it can potentially quantify prediction reliability. Clinicians may trust 'blackbox' models more if robust reliability information is available, which may lead to more models being adopted into clinical practice. There are several deep learning-inspired uncertainty estimation techniques, but few are implemented on medical datasets -- fewer on single institutional datasets/models. We sought to compare dropout variational inference (DO), test-time augmentation (TTA), conformal predictions, and single deterministic methods for estimating uncertainty using our model trained to predict feeding tube placement for 271 head and neck cancer patients treated with radiation. We compared the area under the curve (AUC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) trends for each method at various cutoffs that sought to stratify patients into 'certain' and 'uncertain' cohorts. These cutoffs were obtained by calculating the percentile "uncertainty" within the validation cohort and applied to the testing cohort. Broadly, the AUC, sensitivity, and NPV increased as the predictions were more 'certain' -- i.e., lower uncertainty estimates. However, when a majority vote (implementing 2/3 criteria: DO, TTA, conformal predictions) or a stricter approach (3/3 criteria) were used, AUC, sensitivity, and NPV improved without a notable loss in specificity or PPV. Especially for smaller, single institutional datasets, it may be important to evaluate multiple estimations techniques before incorporating a model into clinical practice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题