了解胸部X光片临床报告生成的转移学习，并使用改良的变压器体系结构

论文标题

了解胸部X光片临床报告生成的转移学习，并使用改良的变压器体系结构

Understanding Transfer Learning for Chest Radiograph Clinical Report Generation with Modified Transformer Architectures

论文作者

Vendrow, Edward, Schonfeld, Ethan

论文摘要

在人工智能应用中，图像字幕的任务越来越普遍。一种重要的应用是胸部X光片的临床报告生成。非结构化报告的临床写作是耗时且容易出错的。自动化系统将改善标准化，降低错误，时间消耗和医疗可及性。在本文中，我们证明了域特定的预训练的重要性，并为医疗图像字幕任务提出了修改后的变压器体系结构。为此，我们训练一系列改良的变压器，从胸部X光片图像输入中生成临床报道。 These modified transformers include: a meshed-memory augmented transformer architecture with visual extractor using ImageNet pre-trained weights, a meshed-memory augmented transformer architecture with visual extractor using CheXpert pre-trained weights, and a meshed-memory augmented transformer whose encoder is passed the concatenated embeddings using both ImageNet pre-trained weights and CheXpert pre-trained weights.我们使用BLEU（1-4），Rouge-L，苹果酒和Chexbert F1分数来验证我们的模型，并以最先进的模型来证明竞争分数。我们提供的证据表明，Imagenet预训练不适合医疗图像字幕的任务，尤其是对于频繁的条件较低（例如：心脏增大的心脏症，肺部病变，气胸）。此外，我们证明了双功能模型可改善特定医疗状况（水肿，合并，气胸，支持设备）和整体Chexbert F1得分的性能，应在未来的工作中进一步发展。这样的双重特征模型，包括Imagenet预训练和特定领域的预训练，都可以在医学中的各种图像字幕模型中使用。

The image captioning task is increasingly prevalent in artificial intelligence applications for medicine. One important application is clinical report generation from chest radiographs. The clinical writing of unstructured reports is time consuming and error-prone. An automated system would improve standardization, error reduction, time consumption, and medical accessibility. In this paper we demonstrate the importance of domain specific pre-training and propose a modified transformer architecture for the medical image captioning task. To accomplish this, we train a series of modified transformers to generate clinical reports from chest radiograph image input. These modified transformers include: a meshed-memory augmented transformer architecture with visual extractor using ImageNet pre-trained weights, a meshed-memory augmented transformer architecture with visual extractor using CheXpert pre-trained weights, and a meshed-memory augmented transformer whose encoder is passed the concatenated embeddings using both ImageNet pre-trained weights and CheXpert pre-trained weights. We use BLEU(1-4), ROUGE-L, CIDEr, and the clinical CheXbert F1 scores to validate our models and demonstrate competitive scores with state of the art models. We provide evidence that ImageNet pre-training is ill-suited for the medical image captioning task, especially for less frequent conditions (eg: enlarged cardiomediastinum, lung lesion, pneumothorax). Furthermore, we demonstrate that the double feature model improves performance for specific medical conditions (edema, consolidation, pneumothorax, support devices) and overall CheXbert F1 score, and should be further developed in future work. Such a double feature model, including both ImageNet pre-training as well as domain specific pre-training, could be used in a wide range of image captioning models in medicine.

下载PDF全文

下载文献需遵守相关版权规定

论文标题