论文标题
学习单眼3D车辆检测,而无需3D边界盒标签
Learning Monocular 3D Vehicle Detection without 3D Bounding Box Labels
论文作者
论文摘要
基于深度学习的3D对象探测器的培训需要具有3D边界框标签的大型数据集,以进行监督,这些数据集必须通过手持标签生成。我们提出了一个网络架构和培训程序,用于学习单眼3D对象检测,而无需3D边界框标签。通过将对象表示为三角形网格并采用可区分的形状渲染,我们根据深度图,分割掩码和自我和对象动作来定义损失函数,这些函数是由预先训练的,现成的网络生成的。我们评估了现实世界中Kitti数据集的提议算法,并与需要3D边界框标签的最先进方法相比,实现了有希望的性能,并且与常规基线方法相比。
The training of deep-learning-based 3D object detectors requires large datasets with 3D bounding box labels for supervision that have to be generated by hand-labeling. We propose a network architecture and training procedure for learning monocular 3D object detection without 3D bounding box labels. By representing the objects as triangular meshes and employing differentiable shape rendering, we define loss functions based on depth maps, segmentation masks, and ego- and object-motion, which are generated by pre-trained, off-the-shelf networks. We evaluate the proposed algorithm on the real-world KITTI dataset and achieve promising performance in comparison to state-of-the-art methods requiring 3D bounding box labels for training and superior performance to conventional baseline methods.