看到声音和听力的声音：使用跨模式的自我设计学习歧视性嵌入

论文标题

看到声音和听力的声音：使用跨模式的自我设计学习歧视性嵌入

Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision

论文作者

Chung, Soo-Whan, Kang, Hong Goo, Chung, Joon Son

论文摘要

这项工作的目的是训练判别性跨模式嵌入，而无需访问手动注释的数据。自我监管学习的最新进展表明，可以从自然的跨模式同步中学到有效的表示。我们基于较早的工作来训练对单模式下游任务更具歧视性的嵌入。为此，我们提出了一种新颖的培训策略，该策略不仅可以优化跨模式的指标，而且还可以在每种方式内实施类内部特征分离。该方法的有效性是在两个下游任务上证明的：使用在视听同步进行训练的功能的唇读，以及使用训练跨模式生物识别匹配的功能来识别扬声器识别。所提出的方法优于最先进的自我监督基线，这是一个显着的边缘。

The goal of this work is to train discriminative cross-modal embeddings without access to manually annotated data. Recent advances in self-supervised learning have shown that effective representations can be learnt from natural cross-modal synchrony. We build on earlier work to train embeddings that are more discriminative for uni-modal downstream tasks. To this end, we propose a novel training strategy that not only optimises metrics across modalities, but also enforces intra-class feature separation within each of the modalities. The effectiveness of the method is demonstrated on two downstream tasks: lip reading using the features trained on audio-visual synchronisation, and speaker recognition using the features trained for cross-modal biometric matching. The proposed method outperforms state-of-the-art self-supervised baselines by a signficant margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题