论文标题
大规模监督的语义嵌入大规模零拍学习
Webly Supervised Semantic Embeddings for Large Scale Zero-Shot Learning
论文作者
论文摘要
零射击学习(ZSL)使图像中的对象识别在没有数据集中的一部分的视觉训练数据的情况下成为可能。当类的数量较大时,类通常由从未注释的文本集合中自动学习的语义类原型表示。与手动设计的语义原型(如属性)相比,这通常导致性能要低得多。虽然大多数ZSL都致力于视觉方面和重用从通用文本集合中学到的标准语义原型,但我们专注于大型ZSL的语义类原型设计问题。更具体地说,我们调查了与照片作为文本集相关的嘈杂的文本元数据的使用,因为我们假设它们可能适当地利用了视觉类别的更合理的语义嵌入。因此,我们利用基于源的投票策略来改善语义原型的鲁棒性。大规模图像网数据集的评估显示,ZSL性能在两个强基础上以及先前工作中使用的通常的语义嵌入方式显着改善。我们表明,对于几种嵌入方法,可以获得此改进,当人们使用自动创建的视觉和文本功能时,导致了最新的结果。
Zero-shot learning (ZSL) makes object recognition in images possible in absence of visual training data for a part of the classes from a dataset. When the number of classes is large, classes are usually represented by semantic class prototypes learned automatically from unannotated text collections. This typically leads to much lower performances than with manually designed semantic prototypes such as attributes. While most ZSL works focus on the visual aspect and reuse standard semantic prototypes learned from generic text collections, we focus on the problem of semantic class prototype design for large scale ZSL. More specifically, we investigate the use of noisy textual metadata associated to photos as text collections, as we hypothesize they are likely to provide more plausible semantic embeddings for visual classes if exploited appropriately. We thus make use of a source-based voting strategy to improve the robustness of semantic prototypes. Evaluation on the large scale ImageNet dataset shows a significant improvement in ZSL performances over two strong baselines, and over usual semantic embeddings used in previous works. We show that this improvement is obtained for several embedding methods, leading to state of the art results when one uses automatically created visual and text features.