Himode：混合单眼全向深度估计模型

论文标题

Himode：混合单眼全向深度估计模型

HiMODE: A Hybrid Monocular Omnidirectional Depth Estimation Model

论文作者

Junayed, Masum Shah, Sadeghzadeh, Arezoo, Islam, Md Baharul, Wong, Lai-Kuan, Aydin, Tarkan

论文摘要

单眼全向深度估计由于其在传感360°周围环境中的广泛应用，因此引起了相当大的研究关注。该领域中的现有方法在恢复地面深度图中丢失的小物体细节和数据中的局限性受到限制。在本文中，一种新型的单眼全向深度估计模型，即基于混合CNN+变压器（编码器数据编码器）体系结构提出的HIMODE，其模块的设计有效地设计以减轻变形和计算成本，而无需降级。首先，我们设计了一个基于HNET块的功能金字塔网络，以提取边缘附近的高分辨率特征。该性能得到了进一步的改进，从自我和互相注意力层和变压器编码器和解码器中的空间/时间贴片中受益。此外，还采用了空间残留块来减少参数数量。通过共同传递从每个骨干块的输入图像中提取的深度特征，以及通过上下文调整层预测的变压器编码器预测的原始深度图，我们的模型可以产生与地面真相更具视觉质量的结果深度图。全面的消融研究证明了每个模块的重要性。在三个数据集上进行的广泛实验； Stanford3d，MatterPort3D和SunCG表明，Himode可以实现360°单眼深度估计的最新性能。

Monocular omnidirectional depth estimation is receiving considerable research attention due to its broad applications for sensing 360° surroundings. Existing approaches in this field suffer from limitations in recovering small object details and data lost during the ground-truth depth map acquisition. In this paper, a novel monocular omnidirectional depth estimation model, namely HiMODE is proposed based on a hybrid CNN+Transformer (encoder-decoder) architecture whose modules are efficiently designed to mitigate distortion and computational cost, without performance degradation. Firstly, we design a feature pyramid network based on the HNet block to extract high-resolution features near the edges. The performance is further improved, benefiting from a self and cross attention layer and spatial/temporal patches in the Transformer encoder and decoder, respectively. Besides, a spatial residual block is employed to reduce the number of parameters. By jointly passing the deep features extracted from an input image at each backbone block, along with the raw depth maps predicted by the transformer encoder-decoder, through a context adjustment layer, our model can produce resulting depth maps with better visual quality than the ground-truth. Comprehensive ablation studies demonstrate the significance of each individual module. Extensive experiments conducted on three datasets; Stanford3D, Matterport3D, and SunCG, demonstrate that HiMODE can achieve state-of-the-art performance for 360° monocular depth estimation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题