论文标题
基于对比变压器的多个实例学习,用于弱监督的息肉框架检测
Contrastive Transformer-based Multiple Instance Learning for Weakly Supervised Polyp Frame Detection
论文作者
论文摘要
当前的息肉检测方法来自结肠镜检查视频专有(即健康)训练图像,这是我)忽略连续视频框架中时间信息的重要性,ii)缺乏对息肉的知识。因此,它们通常存在很高的检测错误,尤其是在具有挑战性的息肉病例(例如,小,平坦或部分可见的息肉)上。在这项工作中,我们将息肉检测作为弱监督的异常检测任务提出,该任务使用视频级别标记的训练数据来检测框架级别的息肉。特别是,我们提出了一种新型的基于卷积变压器的多个实例学习方法,旨在从异常视频中识别异常框架(即带有息肉的框架)(即至少包含一个息肉的视频)。在我们的方法中,当我们同时优化视频和摘要级异常得分时,本地和全局的时间依赖性被无缝捕获。还提出了一种对比片段挖掘方法,以实现充满挑战的息肉病例的有效建模。最终的方法达到了检测准确性,在这项工作中介绍的新大规模结肠镜检查视频数据集上,它比当前的最新方法要好得多。
Current polyp detection methods from colonoscopy videos use exclusively normal (i.e., healthy) training images, which i) ignore the importance of temporal information in consecutive video frames, and ii) lack knowledge about the polyps. Consequently, they often have high detection errors, especially on challenging polyp cases (e.g., small, flat, or partially visible polyps). In this work, we formulate polyp detection as a weakly-supervised anomaly detection task that uses video-level labelled training data to detect frame-level polyps. In particular, we propose a novel convolutional transformer-based multiple instance learning method designed to identify abnormal frames (i.e., frames with polyps) from anomalous videos (i.e., videos containing at least one frame with polyp). In our method, local and global temporal dependencies are seamlessly captured while we simultaneously optimise video and snippet-level anomaly scores. A contrastive snippet mining method is also proposed to enable an effective modelling of the challenging polyp cases. The resulting method achieves a detection accuracy that is substantially better than current state-of-the-art approaches on a new large-scale colonoscopy video dataset introduced in this work.