论文标题
重新思考零拍的视频分类:现实应用程序的端到端培训
Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications
论文作者
论文摘要
深度学习(DL)在大型数据集中受过培训,可以将视频准确地分为数百个不同的类别。但是,视频数据的注释很昂贵。零射门学习(ZSL)提出了解决此问题的一种解决方案。 ZSL一次训练一个模型,并概括为培训数据集中不存在的类的新任务。我们在视频分类中提出了第一个用于ZSL的端到端算法。我们的培训程序基于最近视频分类文献的见解,并使用可训练的3D CNN来学习视觉功能。这与以前的视频ZSL方法相反,后者使用验证的特征提取器。我们还扩展了当前的基准测量范式:以前的技术旨在使测试任务在训练时间未知,但没有达到这个目标。我们鼓励在培训和测试数据中转移域的变化,而不允许将ZSL模型定制为特定的测试数据集。我们的表现要大得多。我们的代码,评估程序和模型权重可以在github.com/bbrattoli/zeroshotvideocalciencation上找到。
Trained on large datasets, deep learning (DL) can accurately classify videos into hundreds of diverse classes. However, video data is expensive to annotate. Zero-shot learning (ZSL) proposes one solution to this problem. ZSL trains a model once, and generalizes to new tasks whose classes are not present in the training dataset. We propose the first end-to-end algorithm for ZSL in video classification. Our training procedure builds on insights from recent video classification literature and uses a trainable 3D CNN to learn the visual features. This is in contrast to previous video ZSL methods, which use pretrained feature extractors. We also extend the current benchmarking paradigm: Previous techniques aim to make the test task unknown at training time but fall short of this goal. We encourage domain shift across training and test data and disallow tailoring a ZSL model to a specific test dataset. We outperform the state-of-the-art by a wide margin. Our code, evaluation procedure and model weights are available at github.com/bbrattoli/ZeroShotVideoClassification.