对自我监督的视觉预训练的密集对比度学习

论文标题

对自我监督的视觉预训练的密集对比度学习

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

论文作者

Wang, Xinlong, Zhang, Rufeng, Shen, Chunhua, Kong, Tao, Li, Lei

论文摘要

迄今为止，为图像分类设计和优化了大多数现有的自我监督学习方法。由于图像级预测与像素级预测之间的差异，这些预训练的模型对于密集的预测任务可能是最佳的预测任务。为了填补这一空白，我们旨在设计一种有效的，密集的自我监督的学习方法，该方法通过考虑本地功能之间的对应关系，直接在像素（或本地功能）级别上工作。我们提出了密集的对比度学习，该学习通过优化输入图像的两个视图之间的像素级别的成对对比度（DIS）相似性损失来实现自我监督的学习。与基线方法MOCO-V2相比，我们的方法引入了可忽略不计的计算开销（仅慢于1％），但在转移到下游密集的预测任务（包括对象检测，语义分段和实例分段）时，表现出始终如一的卓越性能；并以较大的边距优于最先进的方法。具体而言，在强大的MOCO-V2基线上，我们的方法在Pascal VOC对象检测中获得了2.0％AP的显着提高，可可对象检测的1.1％AP，可可实例细分的0.9％AP，Pascal VOC Smentication segentation coco instemation segmentation ap ap ap ap ap ap ap ap ap ap ap ap ap ap ap ap poc的2.0％1.1％，而pascal voc语义分段为3.0％miou和1.8％的MIOU MIOU MIOU在CityScapeS semantic semantic semantic semantic sempentation。代码可在以下网址找到：https：//git.io/adelaidet

To date, most existing self-supervised learning methods are designed and optimized for image classification. These pre-trained models can be sub-optimal for dense prediction tasks due to the discrepancy between image-level prediction and pixel-level prediction. To fill this gap, we aim to design an effective, dense self-supervised learning method that directly works at the level of pixels (or local features) by taking into account the correspondence between local features. We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images. Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only <1% slower), but demonstrates consistently superior performance when transferring to downstream dense prediction tasks including object detection, semantic segmentation and instance segmentation; and outperforms the state-of-the-art methods by a large margin. Specifically, over the strong MoCo-v2 baseline, our method achieves significant improvements of 2.0% AP on PASCAL VOC object detection, 1.1% AP on COCO object detection, 0.9% AP on COCO instance segmentation, 3.0% mIoU on PASCAL VOC semantic segmentation and 1.8% mIoU on Cityscapes semantic segmentation. Code is available at: https://git.io/AdelaiDet

下载PDF全文

下载文献需遵守相关版权规定

论文标题