论文标题
MONOJSG:单眼3D对象检测的联合语义和几何成本量
MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection
论文作者
论文摘要
由于2d-3d投影的固有性质固有的性质,单眼3D对象检测缺乏准确的深度恢复能力。尽管深度神经网络(DNN)可以从高级学习的特征中实现单眼深度感应,但由于深卷积机制,通常会省略像素级提示。为了从DNN中强大的功能表示和像素级几何约束中受益,我们将单眼对象深度估计重新制定为渐进的完善问题,并提出了一个联合语义和几何成本量,以模拟深度误差。具体而言,我们首先利用神经网络来学习对象位置,维度和密集的3D对象坐标。基于对象深度,将密度坐标与相应的对象特征一起坐在图像空间中,以以关节语义和几何误差方式构建成本量。最终深度是通过将成本量馈送到改进网络中获得的,在该网络中,语义和几何误差的分布是通过直接深度监督正规化的。通过通过改进框架有效缓解深度错误,我们在KITTI和Waymo数据集上实现了最新的结果。
Due to the inherent ill-posed nature of 2D-3D projection, monocular 3D object detection lacks accurate depth recovery ability. Although the deep neural network (DNN) enables monocular depth-sensing from high-level learned features, the pixel-level cues are usually omitted due to the deep convolution mechanism. To benefit from both the powerful feature representation in DNN and pixel-level geometric constraints, we reformulate the monocular object depth estimation as a progressive refinement problem and propose a joint semantic and geometric cost volume to model the depth error. Specifically, we first leverage neural networks to learn the object position, dimension, and dense normalized 3D object coordinates. Based on the object depth, the dense coordinates patch together with the corresponding object features is reprojected to the image space to build a cost volume in a joint semantic and geometric error manner. The final depth is obtained by feeding the cost volume to a refinement network, where the distribution of semantic and geometric error is regularized by direct depth supervision. Through effectively mitigating depth error by the refinement framework, we achieve state-of-the-art results on both the KITTI and Waymo datasets.