Freeseg：从可解释的对比度语言图像进行语义细分训练中的自由面具

论文标题

Freeseg：从可解释的对比度语言图像进行语义细分训练中的自由面具

FreeSeg: Free Mask from Interpretable Contrastive Language-Image Pretraining for Semantic Segmentation

论文作者

Li, Yi, Yao, Huifeng, Wang, Hualiang, Li, Xiaomeng

论文摘要

完全有监督的语义细分从密集的口罩中学习，这需要封闭设置的大量注释成本。在本文中，我们使用自然语言作为监督，而无需任何像素级注释进行开放世界细分。我们将提出的框架称为FreeSeg，在该框架上可以从预训练模型的原始功能图中免费获得面具。与零射击或Openset分割相比，FreeSeg不需要任何带注释的掩码，并且可以广泛预测超出类无需监督的分段之外的类别。具体而言，FreeSeg从图像文本相似性图（ITSM）中获得了可解释的对比度图像预处理（ICLIP）的自由掩码。我们的核心改进是浓密ICLIP的平滑最小汇总，具有部分标签和像素的分割策略。此外，没有复杂的设计，例如分组，聚类或检索，很简单。除了简单性外，Freeseg的表现超过了以前的最先进的边缘，例如在同一设置中，MIOU上MIOU的13.4％。

Fully supervised semantic segmentation learns from dense masks, which requires heavy annotation cost for closed set. In this paper, we use natural language as supervision without any pixel-level annotation for open world segmentation. We call the proposed framework as FreeSeg, where the mask is freely available from raw feature map of pretraining model. Compared with zero-shot or openset segmentation, FreeSeg doesn't require any annotated masks, and it widely predicts categories beyond class-agnostic unsupervised segmentation. Specifically, FreeSeg obtains free mask from Image-Text Similarity Map (ITSM) of Interpretable Contrastive Language-Image Pretraining (ICLIP). And our core improvements are the smoothed min pooling for dense ICLIP, with the partial label and pixel strategies for segmentation. Furthermore, FreeSeg is very straight forward without complex design like grouping, clustering or retrieval. Besides the simplicity, the performances of FreeSeg surpass previous state-of-the-art at large margins, e.g. 13.4% higher at mIoU on VOC dataset in the same settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题