CATNET：终身以自我为中心的手势识别的班级增量3D CORVNET

论文标题

CATNET：终身以自我为中心的手势识别的班级增量3D CORVNET

CatNet: Class Incremental 3D ConvNets for Lifelong Egocentric Gesture Recognition

论文作者

Wang, Zhengwei, She, Qi, Chalasani, Tejo, Smolic, Aljosa

论文摘要

以自然的手势是人类与可穿戴设备（例如VR/AR头盔和眼镜）相互作用的最自然的交流形式。在这种情况下，针对现实世界应用程序的一个主要问题是，在系统中添加新的手势可能很容易成为必要的，例如，适当的VR系统应允许用户逐步自定义手势。传统的深度学习方法需要将所有以前的类样本存储在系统中，并通过合并以前的样本和新样本再次从头开始训练模型，这会花费巨大的记忆并大大增加计算的时间。在这项工作中，我们演示了一个终生的3D卷积框架-C（c）la（a）ss增量（t）al Net（net）工作（catnet），该工作考虑了视频中的时间信息，并可以通过从以前的类型样品中学习样本的表演来识别EgeCentric手势识别终身学习。重要的是，我们提出了一个两流otnet，该catnet部署了RGB和深度模式来训练两个单独的网络。我们在公开可用的数据集（EgogeSture数据集）上评估猫网，并表明猫网可以在很长一段时间内逐步学习许多类。结果还表明，与其他3种单际架构相比，两流体系结构在联合培训和班级增量训练方面都达到了最佳性能。这项工作中使用的代码和预培训模型在https://github.com/villawang/catnet上提供。

Egocentric gestures are the most natural form of communication for humans to interact with wearable devices such as VR/AR helmets and glasses. A major issue in such scenarios for real-world applications is that may easily become necessary to add new gestures to the system e.g., a proper VR system should allow users to customize gestures incrementally. Traditional deep learning methods require storing all previous class samples in the system and training the model again from scratch by incorporating previous samples and new samples, which costs humongous memory and significantly increases computation over time. In this work, we demonstrate a lifelong 3D convolutional framework -- c(C)la(a)ss increment(t)al net(Net)work (CatNet), which considers temporal information in videos and enables lifelong learning for egocentric gesture video recognition by learning the feature representation of an exemplar set selected from previous class samples. Importantly, we propose a two-stream CatNet, which deploys RGB and depth modalities to train two separate networks. We evaluate CatNets on a publicly available dataset -- EgoGesture dataset, and show that CatNets can learn many classes incrementally over a long period of time. Results also demonstrate that the two-stream architecture achieves the best performance on both joint training and class incremental training compared to 3 other one-stream architectures. The codes and pre-trained models used in this work are provided at https://github.com/villawang/CatNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题