Movienet：用于电影理解的整体数据集

论文标题

Movienet：用于电影理解的整体数据集

MovieNet: A Holistic Dataset for Movie Understanding

论文作者

Huang, Qingqiu, Xiong, Yu, Rao, Anyi, Wang, Jiaze, Lin, Dahua

论文摘要

近年来，视觉理解取得了显着进步。但是，如何了解具有艺术风格的基于故事的长期视频，例如电影，仍然具有挑战性。在本文中，我们介绍了Movienet，这是一个用于电影理解的整体数据集。 Movienet包含1,100部电影，其中包含大量多模式数据，例如预告片，照片，情节描述等。此外，Movienet中还提供了手动注释的不同方面，其中包括带有边界框和身份的110万个字符，42k场景边界，2.5K对齐的描述句子，65k的位置和动作标签，以及92k Cinematic Style的标签。据我们所知，Movienet是最大的数据集，具有最丰富的注释，以了解全面的电影理解。基于Movienet，我们设置了几个基准，用于从不同角度理解电影。在这些基准测试上进行了广泛的实验，以显示Movienet的不可估量的价值以及对全面电影理解的当前方法的差距。我们认为，这样的整体数据集将促进有关基于故事的长期视频理解及以后的研究。 Movienet将根据法规在https://movienet.github.io上发布。

Recent years have seen remarkable advances in visual understanding. However, how to understand a story-based long video with artistic styles, e.g. movie, remains challenging. In this paper, we introduce MovieNet -- a holistic dataset for movie understanding. MovieNet contains 1,100 movies with a large amount of multi-modal data, e.g. trailers, photos, plot descriptions, etc. Besides, different aspects of manual annotations are provided in MovieNet, including 1.1M characters with bounding boxes and identities, 42K scene boundaries, 2.5K aligned description sentences, 65K tags of place and action, and 92K tags of cinematic style. To the best of our knowledge, MovieNet is the largest dataset with richest annotations for comprehensive movie understanding. Based on MovieNet, we set up several benchmarks for movie understanding from different angles. Extensive experiments are executed on these benchmarks to show the immeasurable value of MovieNet and the gap of current approaches towards comprehensive movie understanding. We believe that such a holistic dataset would promote the researches on story-based long video understanding and beyond. MovieNet will be published in compliance with regulations at https://movienet.github.io.

下载PDF全文

下载文献需遵守相关版权规定

论文标题