Youmakeup VQA挑战：在特定领域的视频中迈向细粒度的动作理解

论文标题

Youmakeup VQA挑战：在特定领域的视频中迈向细粒度的动作理解

YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos

论文作者

Chen, Shizhe, Wang, Weiying, Ruan, Ludan, Yao, Linli, Jin, Qin

论文摘要

2020年Youmakeup VQA挑战赛的目标是为特定于领域的视频中的精细颗粒动作理解提供一个共同的基准测试。化妆教学视频。我们提出了两项新型的提问任务，以评估模型的精细动作理解能力。第一个任务是\ textbf {面部图像订购}，旨在了解自然语言对面部对象表达的不同动作的视觉效果。第二个任务是\ textbf {步骤订购}，旨在测量未修剪视频和多句子文本之间的跨模式语义对齐。在本文中，我们介绍了挑战指南，所使用的数据集以及两个提议任务的基线模型的性能。基线代码和模型以\ url {https://github.com/aim3-ruc/youmakeup_baseline}发布。

The goal of the YouMakeup VQA Challenge 2020 is to provide a common benchmark for fine-grained action understanding in domain-specific videos e.g. makeup instructional videos. We propose two novel question-answering tasks to evaluate models' fine-grained action understanding abilities. The first task is \textbf{Facial Image Ordering}, which aims to understand visual effects of different actions expressed in natural language to the facial object. The second task is \textbf{Step Ordering}, which aims to measure cross-modal semantic alignments between untrimmed videos and multi-sentence texts. In this paper, we present the challenge guidelines, the dataset used, and performances of baseline models on the two proposed tasks. The baseline codes and models are released at \url{https://github.com/AIM3-RUC/YouMakeup_Baseline}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题