使用X射线胸部图像纠正半监督COVID-19检测的数据不平衡

论文标题

使用X射线胸部图像纠正半监督COVID-19检测的数据不平衡

Correcting Data Imbalance for Semi-Supervised Covid-19 Detection Using X-ray Chest Images

论文作者

Calderon-Ramirez, Saul, Shengxiang-Yang, Moemeni, Armaghan, Elizondo, David, Colreavy-Donnelly, Simon, Chavarria-Estrada, Luis Fernando, Molina-Cabello, Miguel A.

论文摘要

Corona病毒（Covid-19）是一种国际术，在世界范围内迅速传播。深度学习的应用用于19. Covid-19患者的胸部X射线图像的图像分类，可能会成为一种新型的诊断前检测方法。但是，深度学习体系结构需要大型标记的数据集。当研究的主题相对较新时，这通常是一个限制，因为在病毒爆发的情况下，处理小标记的数据集是一个挑战。此外，在一种新的高度传染病的背景下，数据集也高度不平衡，几乎没有新疾病的阳性病例观察到。在这项工作中，我们评估了使用非常有限的标记观测值和高度不平衡的标记数据集的半监督深度学习架构的性能，称为混音。我们提出了一种简单的方法来纠正数据不平衡，每个观察损失函数中的重量重量重量，从而使观察值更高，与代表性不足的类别相对应。对于未标记的观测值，我们提出了由MixMatch计算的伪和增强标签的使用，以选择适当的重量。相对于非平衡的混合算法，混合方法与所提出的基于伪标记的平衡校正的精度最多提高了10％，具有统计学意义。我们使用10、15和20个LabelledObsertations使用了几个可用数据集测试了我们提出的方法。此外，在thetested数据集中包括一个新数据集，由哥斯达黎加成人患者的胸部X射线图像组成

The Corona Virus (COVID-19) is an internationalpandemic that has quickly propagated throughout the world. The application of deep learning for image classification of chest X-ray images of Covid-19 patients, could become a novel pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is relatively new as in the case of the virus outbreak, where dealing with small labelled datasets is a challenge. Moreover, in the context of a new highly infectious disease, the datasets are also highly imbalanced,with few observations from positive cases of the new disease. In this work we evaluate the performance of the semi-supervised deep learning architecture known as MixMatch using a very limited number of labelled observations and highly imbalanced labelled dataset. We propose a simple approach for correcting data imbalance, re-weight each observationin the loss function, giving a higher weight to the observationscorresponding to the under-represented class. For unlabelled observations, we propose the usage of the pseudo and augmentedlabels calculated by MixMatch to choose the appropriate weight. The MixMatch method combined with the proposed pseudo-label based balance correction improved classification accuracy by up to 10%, with respect to the non balanced MixMatch algorithm, with statistical significance. We tested our proposed approach with several available datasets using 10, 15 and 20 labelledobservations. Additionally, a new dataset is included among thetested datasets, composed of chest X-ray images of Costa Rican adult patients

下载PDF全文

下载文献需遵守相关版权规定

论文标题