Label2Label：多属性学习的语言建模框架

论文标题

Label2Label：多属性学习的语言建模框架

Label2Label: A Language Modeling Framework for Multi-Attribute Learning

论文作者

Li, Wanhua, Cao, Zhexuan, Feng, Jianjiang, Zhou, Jie, Lu, Jiwen

论文摘要

对象通常与多个属性相关联，这些属性通常显示出很高的相关性。对属性之间的复杂关系进行建模为多属性学习带来了巨大的挑战。本文提出了一个名为label2label的简单而通用的框架，以利用复杂的属性相关性。从语言建模的角度来看，Label2Label是多属性预测的首次尝试。具体而言，它将每个属性标签视为描述样本的“单词”。当每个样本带有多个属性标签注释时，这些“单词”自然会形成一个无序但有意义的“句子”，其中描述了相应样本的语义信息。受到NLP预训练语言模型的显着成功的启发，Label2Label引入了图像条件的蒙版语言模型，该模型随机掩盖了标签“句子”中的一些“单词”令牌，并旨在基于蒙版的“句子”恢复它们，并通过图像特征传达了上下文。我们的直觉是，如果神经网可以根据上下文和其余属性提示推断丢失的属性，那么实例的属性关系就会得到很好的掌握。 Label2Label在概念上是简单且经验强大的。与高度定制的特定于域的方法相比，我们的方法在不结合特定任务的先验知识和高度专业的网络设计的情况下，在三个不同的多属性学习任务上实现了最先进的结果。代码可从https://github.com/li-wanhua/label2label获得。

Objects are usually associated with multiple attributes, and these attributes often exhibit high correlations. Modeling complex relationships between attributes poses a great challenge for multi-attribute learning. This paper proposes a simple yet generic framework named Label2Label to exploit the complex attribute correlations. Label2Label is the first attempt for multi-attribute prediction from the perspective of language modeling. Specifically, it treats each attribute label as a "word" describing the sample. As each sample is annotated with multiple attribute labels, these "words" will naturally form an unordered but meaningful "sentence", which depicts the semantic information of the corresponding sample. Inspired by the remarkable success of pre-training language models in NLP, Label2Label introduces an image-conditioned masked language model, which randomly masks some of the "word" tokens from the label "sentence" and aims to recover them based on the masked "sentence" and the context conveyed by image features. Our intuition is that the instance-wise attribute relations are well grasped if the neural net can infer the missing attributes based on the context and the remaining attribute hints. Label2Label is conceptually simple and empirically powerful. Without incorporating task-specific prior knowledge and highly specialized network designs, our approach achieves state-of-the-art results on three different multi-attribute learning tasks, compared to highly customized domain-specific methods. Code is available at https://github.com/Li-Wanhua/Label2Label.

下载PDF全文

下载文献需遵守相关版权规定

论文标题