论文标题
餐厅视频中的时空动作识别
Spatiotemporal Action Recognition in Restaurant Videos
论文作者
论文摘要
时空动作识别是在视频中定位和分类动作的任务。我们的项目将此任务应用于分析准备食物的餐馆工作人员的视频录像,其中潜在的应用包括自动结帐和库存管理。这些视频与研究人员所习惯的标准化数据集完全不同,因为它们涉及小物体,快速动作和臭名昭著的数据类别。我们探索两种方法。第一种方法涉及您只看一次熟悉的对象探测器,而另一种方法应用了最近提出的类似物进行动作识别,您只能观看一次。首先,我们使用卷积LSTMS设计和实施了对Yolo的新颖,经常的修改,并探索了这种网络培训中的各种微妙之处。在第二个,我们研究Yowos三维卷积捕获我们独特数据集的时空特征的能力
Spatiotemporal action recognition is the task of locating and classifying actions in videos. Our project applies this task to analyzing video footage of restaurant workers preparing food, for which potential applications include automated checkout and inventory management. Such videos are quite different from the standardized datasets that researchers are used to, as they involve small objects, rapid actions, and notoriously unbalanced data classes. We explore two approaches. The first approach involves the familiar object detector You Only Look Once, and another applying a recently proposed analogue for action recognition, You Only Watch Once. In the first, we design and implement a novel, recurrent modification of YOLO using convolutional LSTMs and explore the various subtleties in the training of such a network. In the second, we study the ability of YOWOs three dimensional convolutions to capture the spatiotemporal features of our unique dataset