Transgeo：变压器是跨视图图像地理位置的全部所需的

论文标题

Transgeo：变压器是跨视图图像地理位置的全部所需的

TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization

论文作者

Zhu, Sijie, Shah, Mubarak, Chen, Chen

论文摘要

基于CNN的主要CNN方法用于跨视图图像地理位置定位依赖于极性变换，并且无法模拟全局相关性。我们提出了一种基于变压器的方法（Transgeo），以从不同的角度解决这些局限性。 Transgeo充分利用了与全球信息建模和明确位置信息编码相关的变压器的优势。我们进一步利用了变压器输入的灵活性，并提出了一种注意力引导的非均匀种植方法，因此，以降低性能下降以降低计算成本，从而消除了非信息图像贴片。可以将保存的计算重新分配以增加信息的分辨率，以提高绩效，而没有额外的计算成本。观察图像时，这种“参加和缩放”策略与人类行为高度相似。值得注意的是，Transgeo在城市和农村数据集中取得了最新的结果，其计算成本明显少于基于CNN的方法。它不依赖于极性变换，并且比基于CNN的方法更快。代码可从https://github.com/jeff-zilence/transgeo2022获得。

The dominant CNN-based methods for cross-view image geo-localization rely on polar transform and fail to model global correlation. We propose a pure transformer-based approach (TransGeo) to address these limitations from a different perspective. TransGeo takes full advantage of the strengths of transformer related to global information modeling and explicit position information encoding. We further leverage the flexibility of transformer input and propose an attention-guided non-uniform cropping method, so that uninformative image patches are removed with negligible drop on performance to reduce computation cost. The saved computation can be reallocated to increase resolution only for informative patches, resulting in performance improvement with no additional computation cost. This "attend and zoom-in" strategy is highly similar to human behavior when observing images. Remarkably, TransGeo achieves state-of-the-art results on both urban and rural datasets, with significantly less computation cost than CNN-based methods. It does not rely on polar transform and infers faster than CNN-based methods. Code is available at https://github.com/Jeff-Zilence/TransGeo2022.

下载PDF全文

下载文献需遵守相关版权规定

论文标题