无监督的深度学习，摄像头姿势和来自单眼视频的光流

论文标题

无监督的深度学习，摄像头姿势和来自单眼视频的光流

Unsupervised Learning of Depth, Camera Pose and Optical Flow from Monocular Video

论文作者

Mandal, Dipan, Jain, Abhilash

论文摘要

我们提出了DFPNET-从单眼图像序列中进行了无监督的，无监督的联合学习系统，光流和自我流动（相机姿势）估计。由于3D场景几何形状的性质，这三个组件耦合在一起。我们利用这一事实以端到端的方式共同培训所有这三个组件。单个复合损耗函数（涉及基于图像重建的深度和光流，双向一致性检查和平滑度损失组件）用于训练网络。使用超参数调整，我们能够将模型尺寸降低到最先进的DFP模型的5％（840万参数）。对Kitti和CityScapes驱动数据集的评估表明，即使模型大小较小，我们的模型在所有三个任务中都可以达到与最新任务相当的结果。

We propose DFPNet -- an unsupervised, joint learning system for monocular Depth, Optical Flow and egomotion (Camera Pose) estimation from monocular image sequences. Due to the nature of 3D scene geometry these three components are coupled. We leverage this fact to jointly train all the three components in an end-to-end manner. A single composite loss function -- which involves image reconstruction-based loss for depth & optical flow, bidirectional consistency checks and smoothness loss components -- is used to train the network. Using hyperparameter tuning, we are able to reduce the model size to less than 5% (8.4M parameters) of state-of-the-art DFP models. Evaluation on KITTI and Cityscapes driving datasets reveals that our model achieves results comparable to state-of-the-art in all of the three tasks, even with the significantly smaller model size.

下载PDF全文

下载文献需遵守相关版权规定

论文标题