KD-MRI：MRI工作流程中图像重建和图像修复的知识蒸馏框架

论文标题

KD-MRI：MRI工作流程中图像重建和图像修复的知识蒸馏框架

KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

论文作者

Murugesan, Balamurali, Vijayarangan, Sricharan, Sarveswaran, Kaushik, Ram, Keerthi, Sivaprakasam, Mohanasankar

论文摘要

深度学习网络正在MRI工作流的每个阶段开发，并提供了最先进的结果。但是，这是以增加计算要求和存储为代价的。因此，在MRI工作流程中的各个阶段，用紧凑型模型代替网络可以显着降低所需的存储空间并提供相当大的加速。在计算机视觉中，知识蒸馏是一种用于模型压缩的常用方法。在我们的工作中，我们提出了图像的知识蒸馏（KD）框架，以在MRI工作流程中图像问题，以开发紧凑的低参数模型，而不会显着下降。我们提出了基于注意力的特征蒸馏方法和模仿损失的组合，并证明了其对流行的MRI重建体系结构DC-CNN的有效性。我们使用心脏，大脑和膝关节MRI数据集进行4倍，5倍和8倍加速度的广泛实验。我们观察到，使用我们建议的KD框架在教师的协助下进行培训的学生网络对未在所有数据集和加速因素进行帮助的学生网络方面有了显着改进。具体来说，对于膝盖数据集，与老师相比，学生网络可实现$ 65 \％$参数减少，更快的CPU运行时间和1.5倍的GPU运行时间。此外，我们将基于注意力的特征蒸馏方法与其他特征蒸馏方法进行比较。我们还进行了一项消融研究，以了解基于注意力的蒸馏和模仿损失的重要性。我们还扩展了MRI超级分辨率的KD框架，并显示出令人鼓舞的结果。

Deep learning networks are being developed in every stage of the MRI workflow and have provided state-of-the-art results. However, this has come at the cost of increased computation requirement and storage. Hence, replacing the networks with compact models at various stages in the MRI workflow can significantly reduce the required storage space and provide considerable speedup. In computer vision, knowledge distillation is a commonly used method for model compression. In our work, we propose a knowledge distillation (KD) framework for the image to image problems in the MRI workflow in order to develop compact, low-parameter models without a significant drop in performance. We propose a combination of the attention-based feature distillation method and imitation loss and demonstrate its effectiveness on the popular MRI reconstruction architecture, DC-CNN. We conduct extensive experiments using Cardiac, Brain, and Knee MRI datasets for 4x, 5x and 8x accelerations. We observed that the student network trained with the assistance of the teacher using our proposed KD framework provided significant improvement over the student network trained without assistance across all the datasets and acceleration factors. Specifically, for the Knee dataset, the student network achieves $65\%$ parameter reduction, 2x faster CPU running time, and 1.5x faster GPU running time compared to the teacher. Furthermore, we compare our attention-based feature distillation method with other feature distillation methods. We also conduct an ablative study to understand the significance of attention-based distillation and imitation loss. We also extend our KD framework for MRI super-resolution and show encouraging results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题