混合的多模式深度转向特征，用于糖尿病性视网膜病严重程度预测

论文标题

混合的多模式深度转向特征，用于糖尿病性视网膜病严重程度预测

Blended Multi-Modal Deep ConvNet Features for Diabetic Retinopathy Severity Prediction

论文作者

Bodapati, J. D., Veeranjaneyulu, N., Shareef, S. N., Hakak, S., Bilal, M., Maddikunta, P. K. R., Jo, O.

论文摘要

糖尿病性视网膜病（DR）是世界各地视觉障碍和失明的主要原因之一。通常在长时间患有糖尿病的患者中发现它。这项工作的主要重点是得出视网膜图像的最佳表示，进一步有助于提高DR识别模型的性能。为了提取最佳表示，使用建议的多模式融合模块将从多个预训练的Convnet模型提取的特征混合在一起。这些最终表示用于训练用于DR识别和严重性水平预测的深神经网络（DNN）。随着每个Convnet提取不同的功能，使用一维池和交叉池的融合会比使用从单个Convnet提取的功能更好。基准Kaggle Aptos 2019竞赛数据集的实验研究表明，对拟议的混合特征表示训练的模型优于现有方法。此外，我们注意到，基于Xception和VGG16功能的基于跨平均池的融合最适合DR识别。通过提出的模型，我们达到了97.41％的精度，用于DR识别的KAPPA统计量为94.82，对于严重程度水平的预测，精度为81.7％，KAPPA统计量为71.1％。另一个有趣的观察结果是，与使用Uni-Modal Deep特征训练的相同型号相比，使用混合功能训练时，输入层辍学的DNN会更快地收敛。

Diabetic Retinopathy (DR) is one of the major causes of visual impairment and blindness across the world. It is usually found in patients who suffer from diabetes for a long period. The major focus of this work is to derive optimal representation of retinal images that further helps to improve the performance of DR recognition models. To extract optimal representation, features extracted from multiple pre-trained ConvNet models are blended using proposed multi-modal fusion module. These final representations are used to train a Deep Neural Network (DNN) used for DR identification and severity level prediction. As each ConvNet extracts different features, fusing them using 1D pooling and cross pooling leads to better representation than using features extracted from a single ConvNet. Experimental studies on benchmark Kaggle APTOS 2019 contest dataset reveals that the model trained on proposed blended feature representations is superior to the existing methods. In addition, we notice that cross average pooling based fusion of features from Xception and VGG16 is the most appropriate for DR recognition. With the proposed model, we achieve an accuracy of 97.41%, and a kappa statistic of 94.82 for DR identification and an accuracy of 81.7% and a kappa statistic of 71.1% for severity level prediction. Another interesting observation is that DNN with dropout at input layer converges more quickly when trained using blended features, compared to the same model trained using uni-modal deep features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题