空间对齐方式的输入辍学

论文标题

空间对齐方式的输入辍学

Input Dropout for Spatially Aligned Modalities

论文作者

de Blois, Sébastien, Garon, Mathieu, Gagné, Christian, Lalonde, Jean-François

论文摘要

现在，包含多种模式（例如颜色，深度和热属性）的计算机视觉数据集现在通常可以访问且可用于求解各种挑战的任务。但是，在许多情况下，部署多传感器头是不可能的。由于许多实用的解决方案往往基于更简单的传感器，主要是出于成本，简单性和鲁棒性考虑。在这项工作中，我们提出了一种培训方法，以利用数据集中可用的这些其他模式，即使它们在测试时不可用。通过假设模态具有很强的空间相关性，我们提出了输入辍学，这是一种简单的技术，它在训练时在一个或多个输入方式中隐藏了随机隐藏，同时仅在测试时间使用规范（例如RGB）模式。我们证明，输入掉落与现有的深卷积体系结构相结合，并在广泛的计算机视觉任务上提高了它们的性能，例如Dhazing，6-DOF对象跟踪，行人检测和对象分类。

Computer vision datasets containing multiple modalities such as color, depth, and thermal properties are now commonly accessible and useful for solving a wide array of challenging tasks. However, deploying multi-sensor heads is not possible in many scenarios. As such many practical solutions tend to be based on simpler sensors, mostly for cost, simplicity and robustness considerations. In this work, we propose a training methodology to take advantage of these additional modalities available in datasets, even if they are not available at test time. By assuming that the modalities have a strong spatial correlation, we propose Input Dropout, a simple technique that consists in stochastic hiding of one or many input modalities at training time, while using only the canonical (e.g. RGB) modalities at test time. We demonstrate that Input Dropout trivially combines with existing deep convolutional architectures, and improves their performance on a wide range of computer vision tasks such as dehazing, 6-DOF object tracking, pedestrian detection and object classification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题