论文标题
具有非均匀标签的医疗人工智能模型的协作培训
Collaborative Training of Medical Artificial Intelligence Models with non-uniform Labels
论文作者
论文摘要
由于近年来的迅速发展,医学图像分析在很大程度上由深度学习(DL)主导。但是,构建强大而强大的DL模型需要使用大型多方数据集进行培训。虽然多个利益相关者提供了公开可用的数据集,但标记这些数据的方式差异很大。例如,一个机构可能会提供包含表示肺炎存在的标签的胸部X光片数据集,而另一家机构可能专注于确定肺中转移的存在。使用所有这些数据培训单个AI模型,对于常规联合学习(FL)是不可行的。这促使我们提出了广泛的FL过程的扩展,即灵活的联合学习(FFL),以进行此类数据的协作培训。我们使用来自全球五个机构的695,000个胸部X光片(每个标签都有不同的标签),我们证明具有异体标记的数据集,基于FFL的训练导致了与常规FL训练相比,只有均匀注释的图像才能实现大幅度的性能提高。我们认为,我们提出的算法可以加速从研究和模拟阶段将协作培训方法带到医疗保健中现实世界应用的过程。
Due to the rapid advancements in recent years, medical image analysis is largely dominated by deep learning (DL). However, building powerful and robust DL models requires training with large multi-party datasets. While multiple stakeholders have provided publicly available datasets, the ways in which these data are labeled vary widely. For Instance, an institution might provide a dataset of chest radiographs containing labels denoting the presence of pneumonia, while another institution might have a focus on determining the presence of metastases in the lung. Training a single AI model utilizing all these data is not feasible with conventional federated learning (FL). This prompts us to propose an extension to the widespread FL process, namely flexible federated learning (FFL) for collaborative training on such data. Using 695,000 chest radiographs from five institutions from across the globe - each with differing labels - we demonstrate that having heterogeneously labeled datasets, FFL-based training leads to significant performance increase compared to conventional FL training, where only the uniformly annotated images are utilized. We believe that our proposed algorithm could accelerate the process of bringing collaborative training methods from research and simulation phase to the real-world applications in healthcare.