通过几何先验改善深度立体网络概括

论文标题

通过几何先验改善深度立体网络概括

Improving Deep Stereo Network Generalization with Geometric Priors

论文作者

Wang, Jialiang, Jampani, Varun, Sun, Deqing, Loop, Charles, Birchfield, Stan, Kautz, Jan

论文摘要

近年来，端到端的深度学习方法具有先进的立体声愿景，并在培训和测试数据相似时获得了出色的结果。但是，很难获得具有密集地面真理的不同现实世界场景的大量数据集，目前尚未公开为研究界公开使用。结果，许多算法依赖于类似场景或合成数据集的小型现实世界数据集，但是在此类数据集上训练的端到端算法通常会概括地概括为在现实世界应用中出现的不同图像。作为解决此问题的一步，我们建议将场景几何形状的先验知识纳入端到端的立体网络，以帮助网络更好地概括。对于给定的网络，我们在网络培训中明确添加了梯度域的平滑度和遮挡推理，而在推理期间体系结构保持不变。在实验上，如果我们对合成数据集进行训练并在Middlebury（真实图像）数据集上进行测试，我们会表现出一致的改进。值得注意的是，我们在不牺牲速度的情况下将Middlebury的PSM-NET精度从5.37 MAE提高到3.21 MAE。

End-to-end deep learning methods have advanced stereo vision in recent years and obtained excellent results when the training and test data are similar. However, large datasets of diverse real-world scenes with dense ground truth are difficult to obtain and currently not publicly available to the research community. As a result, many algorithms rely on small real-world datasets of similar scenes or synthetic datasets, but end-to-end algorithms trained on such datasets often generalize poorly to different images that arise in real-world applications. As a step towards addressing this problem, we propose to incorporate prior knowledge of scene geometry into an end-to-end stereo network to help networks generalize better. For a given network, we explicitly add a gradient-domain smoothness prior and occlusion reasoning into the network training, while the architecture remains unchanged during inference. Experimentally, we show consistent improvements if we train on synthetic datasets and test on the Middlebury (real images) dataset. Noticeably, we improve PSM-Net accuracy on Middlebury from 5.37 MAE to 3.21 MAE without sacrificing speed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题