基于神经形态秋季检测和动作识别数据集的常规视觉模型

论文标题

基于神经形态秋季检测和动作识别数据集的常规视觉模型

Benchmarking Conventional Vision Models on Neuromorphic Fall Detection and Action Recognition Dataset

论文作者

Krishnan, Karthik Sivarama, Krishnan, Koushik Sivarama

论文摘要

近年来，基于神经形态视力的传感器正在捕获具有低功率传感的时空事件的能力。这些传感器记录事件或对传统摄像机的峰值，这有助于保留记录的主题的隐私。这些事件被捕获，因为每个像素亮度变化，并且输出数据流按时间，位置和像素强度更改信息编码。本文提出和基准测试了关于神经形态人类动作识别和秋季检测数据集的微调常规视觉模型的性能。来自动态视觉传感摄像机的时空事件流被编码为标准序列图像帧。这些视频帧用于基于常规的深度学习架构进行基准测试。在这种建议的方法中，我们对这种动态视觉传感（DVS）应用的最先进视觉模型进行了微调，并将这些模型命名为DVS-R2+1D，DVS-CSN，DVS-C2D，DVS-SLOWFAST，DVS-dvs-X3D和DVS-MVIT。在比较这些模型的性能后，我们看到基于最新的MVIT架构DVS-MVIT优于所有其他模型，精度为0.958，F-1得分为0.958。第二好的是DVS-C2D，精度为0.916，F-1得分为0.916。第三和第四是DVS-R2+1D和DVS-SLOWFAST，精度为0.875和0.833，F-1得分分别为0.875和0.861。 DVS-CSN和DVS-X3D是表现最低的模型，精度为0.708和0.625，F1得分分别为0.722和0.625。

Neuromorphic vision-based sensors are gaining popularity in recent years with their ability to capture Spatio-temporal events with low power sensing. These sensors record events or spikes over traditional cameras which helps in preserving the privacy of the subject being recorded. These events are captured as per-pixel brightness changes and the output data stream is encoded with time, location, and pixel intensity change information. This paper proposes and benchmarks the performance of fine-tuned conventional vision models on neuromorphic human action recognition and fall detection datasets. The Spatio-temporal event streams from the Dynamic Vision Sensing cameras are encoded into a standard sequence image frames. These video frames are used for benchmarking conventional deep learning-based architectures. In this proposed approach, we fine-tuned the state-of-the-art vision models for this Dynamic Vision Sensing (DVS) application and named these models as DVS-R2+1D, DVS-CSN, DVS-C2D, DVS-SlowFast, DVS-X3D, and DVS-MViT. Upon comparing the performance of these models, we see the current state-of-the-art MViT based architecture DVS-MViT outperforms all the other models with an accuracy of 0.958 and an F-1 score of 0.958. The second best is the DVS-C2D with an accuracy of 0.916 and an F-1 score of 0.916. Third and Fourth are DVS-R2+1D and DVS-SlowFast with an accuracy of 0.875 and 0.833 and F-1 score of 0.875 and 0.861 respectively. DVS-CSN and DVS-X3D were the least performing models with an accuracy of 0.708 and 0.625 and an F1 score of 0.722 and 0.625 respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题