对齐 - 统一性意识表示零拍的视频分类学习

论文标题

对齐 - 统一性意识表示零拍的视频分类学习

Alignment-Uniformity aware Representation Learning for Zero-shot Video Classification

论文作者

Pu, Shi, Zhao, Kaili, Zheng, Mao

论文摘要

大多数方法通过将视觉语义表示形式对准可见的类中来处理零拍的视频分类，从而将概括限制为看不见的类。为了增强模型的通用性，本文提出了一个端到端框架，该框架保留了对可见和看不见类别的表示形式的一致性和统一属性。具体而言，我们为同时对齐视觉语义特征（即对齐）的对比损失制定了对比损失，并鼓励学习的特征均匀分布（即均匀性）。与仅考虑对齐方式的现有方法不同，我们提出统一性来保留现有特征的最大信息，从而提高了未观察到的特征的可能性。此外，我们通过提出一个插值和推断可见类的功能的类生成器来综合看不见类的功能。此外，我们介绍了两个指标，即紧密度和分散，以量化这两种属性并用作模型通用性的新测量。实验表明，我们的方法在UCF101上的相对提高28.1％，而HMDB51的相对改善显着优于SOTA。代码可用。

Most methods tackle zero-shot video classification by aligning visual-semantic representations within seen classes, which limits generalization to unseen classes. To enhance model generalizability, this paper presents an end-to-end framework that preserves alignment and uniformity properties for representations on both seen and unseen classes. Specifically, we formulate a supervised contrastive loss to simultaneously align visual-semantic features (i.e., alignment) and encourage the learned features to distribute uniformly (i.e., uniformity). Unlike existing methods that only consider the alignment, we propose uniformity to preserve maximal-info of existing features, which improves the probability that unobserved features fall around observed data. Further, we synthesize features of unseen classes by proposing a class generator that interpolates and extrapolates the features of seen classes. Besides, we introduce two metrics, closeness and dispersion, to quantify the two properties and serve as new measurements of model generalizability. Experiments show that our method significantly outperforms SoTA by relative improvements of 28.1% on UCF101 and 27.0% on HMDB51. Code is available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题