蒙版事件建模：事件摄像机的自我监督预处理

论文标题

蒙版事件建模：事件摄像机的自我监督预处理

Masked Event Modeling: Self-Supervised Pretraining for Event Cameras

论文作者

Klenk, Simon, Bonello, David, Koestler, Lukas, Araslanov, Nikita, Cremers, Daniel

论文摘要

事件摄像机异步地捕获亮度随着低潜伏期，高时间分辨率和高动态范围而变化。但是，事件数据的注释是一个昂贵且费力的过程，它通过事件模式限制了深度学习方法进行分类和其他语义任务的使用。为了减少对标记的事件数据的依赖，我们引入了蒙版事件建模（MEM），这是事件的自我监督框架。我们的方法在未标记的事件上预定了神经网络，该网络可以源自任何事件摄像机的录制。随后，预处理的模型在下游任务上进行了审核，从而使任务准确性持续提高。例如，我们的方法达到了三个数据集（N-Imagenet，N-Cars和N-Caltech101）的最新分类精度，从而通过大量利润来提高先前工作的前1位准确性。在对现实世界事件数据进行测试时，MEM甚至优于受监督的基于RGB的预训练。通过MEM预测的模型也具有标签效率，并且可以很好地推广到语义图像分割的密集任务。

Event cameras asynchronously capture brightness changes with low latency, high temporal resolution, and high dynamic range. However, annotation of event data is a costly and laborious process, which limits the use of deep learning methods for classification and other semantic tasks with the event modality. To reduce the dependency on labeled event data, we introduce Masked Event Modeling (MEM), a self-supervised framework for events. Our method pretrains a neural network on unlabeled events, which can originate from any event camera recording. Subsequently, the pretrained model is finetuned on a downstream task, leading to a consistent improvement of the task accuracy. For example, our method reaches state-of-the-art classification accuracy across three datasets, N-ImageNet, N-Cars, and N-Caltech101, increasing the top-1 accuracy of previous work by significant margins. When tested on real-world event data, MEM is even superior to supervised RGB-based pretraining. The models pretrained with MEM are also label-efficient and generalize well to the dense task of semantic image segmentation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题