论文标题
肺交换自动编码器:学习胸部X光片的解开结构纹理表示
Lung Swapping Autoencoder: Learning a Disentangled Structure-texture Representation of Chest Radiographs
论文作者
论文摘要
胸部X光片(CXR)的标记良好数据集由于注释成本高而难以获取。因此,希望以无监督的方式学习健壮且可转移的表示形式,以使缺乏标记数据的任务受益。与自然图像不同,医学图像之前有自己的领域。例如,我们观察到许多肺部疾病,例如Covid-19,表现为肺组织纹理的变化,而不是解剖结构。因此,我们假设仅研究没有结构变化影响的纹理对于下游预后和预测性建模任务将是有利的。在本文中,我们提出了一个生成框架,即肺交换自动编码器(LSAE),该框架学习了CXR的分解表示,以将纹理因子与结构因子分解。具体而言,通过对抗性训练,LSAE被优化以生成一个混合图像,该杂交图像将肺形保持在一个图像中,但继承了另一个图像的肺纹理。为了证明脱离纹理表示的有效性,我们评估了Chestx-Ray14(n = 112,120)的LSAE的纹理编码器$ encoder $ encoder $ encoder $ encoder $ enc.^t $,以及我们自己的多机构COVID-19结果预测数据集,COVOC,COVOC,COVOC(n = 340(subset-1) + 53(subset-1) + 53(subset-etset-2))。在这两个数据集上,我们通过在LSAE中填充$ ence^t $的填充$ 77%,比基线Inception V3小77%。此外,在具有类似模型预算的半自我监督环境中,LSAE中的$ ence^t $也与最先进的Moco竞争。通过“重新混合”纹理和形状因子,我们产生有意义的混合图像,可以增加训练集。这种数据增强方法可以进一步改善COVOC预测性能。即使我们直接评估子集中训练的模型,即使在没有任何微调的情况下,改进也是一致的。
Well-labeled datasets of chest radiographs (CXRs) are difficult to acquire due to the high cost of annotation. Thus, it is desirable to learn a robust and transferable representation in an unsupervised manner to benefit tasks that lack labeled data. Unlike natural images, medical images have their own domain prior; e.g., we observe that many pulmonary diseases, such as the COVID-19, manifest as changes in the lung tissue texture rather than the anatomical structure. Therefore, we hypothesize that studying only the texture without the influence of structure variations would be advantageous for downstream prognostic and predictive modeling tasks. In this paper, we propose a generative framework, the Lung Swapping Autoencoder (LSAE), that learns factorized representations of a CXR to disentangle the texture factor from the structure factor. Specifically, by adversarial training, the LSAE is optimized to generate a hybrid image that preserves the lung shape in one image but inherits the lung texture of another. To demonstrate the effectiveness of the disentangled texture representation, we evaluate the texture encoder $Enc^t$ in LSAE on ChestX-ray14 (N=112,120), and our own multi-institutional COVID-19 outcome prediction dataset, COVOC (N=340 (Subset-1) + 53 (Subset-2)). On both datasets, we reach or surpass the state-of-the-art by finetuning $Enc^t$ in LSAE that is 77% smaller than a baseline Inception v3. Additionally, in semi-and-self supervised settings with a similar model budget, $Enc^t$ in LSAE is also competitive with the state-of-the-art MoCo. By "re-mixing" the texture and shape factors, we generate meaningful hybrid images that can augment the training set. This data augmentation method can further improve COVOC prediction performance. The improvement is consistent even when we directly evaluate the Subset-1 trained model on Subset-2 without any fine-tuning.