论文标题
学习抽象和预测人类行动
Learning to Abstract and Predict Human Actions
论文作者
论文摘要
随着时间的流逝,人类活动自然地结构为层次结构。为了进行动作预测,事件序列中的时间关系被当前方法广泛利用,而它们在不同级别的抽象层面上的语义连贯性并未得到很好的探索。在这项工作中,我们对视频中人类活动的层次结构进行了建模,并在行动预测中证明了这种结构的力量。我们提出了层次结构编码器反复审判者,这是一种多级神经机器,可以通过观察事件的部分层次结构并将这种结构推出到未来的预测中,以在多个抽象的层次中学习到未来的预测,可以学习人类活动的结构。我们还在早餐动作视频上引入了新的粗到最新动作注释,以创建一个全面,一致且结构清晰的视频分层活动数据集。通过我们的实验,我们检查并重新考虑了活动预测任务的设置和指标,以无偏评估预测系统,并证明了层次建模在可靠和详细的长期行动预测方面的作用。
Human activities are naturally structured as hierarchies unrolled over time. For action prediction, temporal relations in event sequences are widely exploited by current methods while their semantic coherence across different levels of abstraction has not been well explored. In this work we model the hierarchical structure of human activities in videos and demonstrate the power of such structure in action prediction. We propose Hierarchical Encoder-Refresher-Anticipator, a multi-level neural machine that can learn the structure of human activities by observing a partial hierarchy of events and roll-out such structure into a future prediction in multiple levels of abstraction. We also introduce a new coarse-to-fine action annotation on the Breakfast Actions videos to create a comprehensive, consistent, and cleanly structured video hierarchical activity dataset. Through our experiments, we examine and rethink the settings and metrics of activity prediction tasks toward unbiased evaluation of prediction systems, and demonstrate the role of hierarchical modeling toward reliable and detailed long-term action forecasting.