重新思考零拍的视频分类：现实应用程序的端到端培训

论文标题

重新思考零拍的视频分类：现实应用程序的端到端培训

Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications

论文作者

Brattoli, Biagio, Tighe, Joseph, Zhdanov, Fedor, Perona, Pietro, Chalupka, Krzysztof

论文摘要

深度学习（DL）在大型数据集中受过培训，可以将视频准确地分为数百个不同的类别。但是，视频数据的注释很昂贵。零射门学习（ZSL）提出了解决此问题的一种解决方案。 ZSL一次训练一个模型，并概括为培训数据集中不存在的类的新任务。我们在视频分类中提出了第一个用于ZSL的端到端算法。我们的培训程序基于最近视频分类文献的见解，并使用可训练的3D CNN来学习视觉功能。这与以前的视频ZSL方法相反，后者使用验证的特征提取器。我们还扩展了当前的基准测量范式：以前的技术旨在使测试任务在训练时间未知，但没有达到这个目标。我们鼓励在培训和测试数据中转移域的变化，而不允许将ZSL模型定制为特定的测试数据集。我们的表现要大得多。我们的代码，评估程序和模型权重可以在github.com/bbrattoli/zeroshotvideocalciencation上找到。

Trained on large datasets, deep learning (DL) can accurately classify videos into hundreds of diverse classes. However, video data is expensive to annotate. Zero-shot learning (ZSL) proposes one solution to this problem. ZSL trains a model once, and generalizes to new tasks whose classes are not present in the training dataset. We propose the first end-to-end algorithm for ZSL in video classification. Our training procedure builds on insights from recent video classification literature and uses a trainable 3D CNN to learn the visual features. This is in contrast to previous video ZSL methods, which use pretrained feature extractors. We also extend the current benchmarking paradigm: Previous techniques aim to make the test task unknown at training time but fall short of this goal. We encourage domain shift across training and test data and disallow tailoring a ZSL model to a specific test dataset. We outperform the state-of-the-art by a wide margin. Our code, evaluation procedure and model weights are available at github.com/bbrattoli/ZeroShotVideoClassification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题