分裂合并池

论文标题

Split-Merge Pooling

论文作者

Jafari, Omid Hosseini, Rother, Carsten

论文摘要

有各种各样的方法可以通过卷积神经网络（CNN）获得广阔的接受场，例如合并或跨性卷积。这些方法中的大多数最初都是为图像分类而设计的，后来适合着密集的预测任务，例如语义分割。但是，这种适应的主要缺点是空间信息的丢失。即使是流行的扩张卷积方法，理论上能够以完整的空间分辨率进行操作，也需要针对大图像大小进行亚样本特征，以使训练和推理可进行。在这项工作中，我们介绍了分裂合并池，以完全保留空间信息而无需任何子采样。通过将分裂合并池应用于深网，我们同时获得了一个非常大的接受场。我们评估了我们从CityScapes和GTA-5数据集采集的大图像大小的大小图像大小的密集语义分割的方法。我们证明，通过通过分裂合并替换最大泵和跨性别的卷积，我们能够显着提高RESNET不同变化的准确性。

There are a variety of approaches to obtain a vast receptive field with convolutional neural networks (CNNs), such as pooling or striding convolutions. Most of these approaches were initially designed for image classification and later adapted to dense prediction tasks, such as semantic segmentation. However, the major drawback of this adaptation is the loss of spatial information. Even the popular dilated convolution approach, which in theory is able to operate with full spatial resolution, needs to subsample features for large image sizes in order to make the training and inference tractable. In this work, we introduce Split-Merge pooling to fully preserve the spatial information without any subsampling. By applying Split-Merge pooling to deep networks, we achieve, at the same time, a very large receptive field. We evaluate our approach for dense semantic segmentation of large image sizes taken from the Cityscapes and GTA-5 datasets. We demonstrate that by replacing max-pooling and striding convolutions with our split-merge pooling, we are able to improve the accuracy of different variations of ResNet significantly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题