论文标题
时尚形式:人类时尚细分和认可的简单,有效和统一的基线
Fashionformer: A simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition
论文作者
论文摘要
人类时尚理解是一项至关重要的计算机视觉任务,因为它具有用于现实世界应用的全面信息。这种关注人类时尚细分和属性识别。与以前的工作相反,将每个任务分别模拟为多头预测问题,我们的见解是通过Vision Transformer建模将这两个任务与一个统一模型桥接以使每个任务受益。特别是,我们介绍了分割的对象查询和属性预测的属性查询。查询及其相应的功能均可通过掩码预测链接。然后,我们采用两流查询学习框架来学习解耦的查询表示。我们设计了一个新颖的多层渲染模块,用于属性流,以探索更细粒度的功能。解码器设计与Detr具有相同的精神。因此,我们将提出的方法\ textit {fahsionformer}命名。在三个人类时尚数据集上进行的广泛实验说明了我们方法的有效性。特别地,在\ textit {a textIt {a intivit(ap $^{\ text {mask}} _ {\ text {\ text _ {iou+f} _1} $)中,我们具有相同骨架的方法比以前的作品相比,与以前的作品相比,我们具有相同骨架的方法。据我们所知,我们是人类时尚分析的第一个统一的端到端视觉变压器框架。我们希望这种简单而有效的方法可以作为时尚分析的新灵活基准。代码可从https://github.com/xushilin1/fashionformer获得。
Human fashion understanding is one crucial computer vision task since it has comprehensive information for real-world applications. This focus on joint human fashion segmentation and attribute recognition. Contrary to the previous works that separately model each task as a multi-head prediction problem, our insight is to bridge these two tasks with one unified model via vision transformer modeling to benefit each task. In particular, we introduce the object query for segmentation and the attribute query for attribute prediction. Both queries and their corresponding features can be linked via mask prediction. Then we adopt a two-stream query learning framework to learn the decoupled query representations.We design a novel Multi-Layer Rendering module for attribute stream to explore more fine-grained features. The decoder design shares the same spirit as DETR. Thus we name the proposed method \textit{Fahsionformer}. Extensive experiments on three human fashion datasets illustrate the effectiveness of our approach. In particular, our method with the same backbone achieve \textbf{relative 10\% improvements} than previous works in case of \textit{a joint metric (AP$^{\text{mask}}_{\text{IoU+F}_1}$) for both segmentation and attribute recognition}. To the best of our knowledge, we are the first unified end-to-end vision transformer framework for human fashion analysis. We hope this simple yet effective method can serve as a new flexible baseline for fashion analysis. Code is available at https://github.com/xushilin1/FashionFormer.