论文标题
通过基于能量的惊喜建模在以以上的视频中无监督的目光预测
Unsupervised Gaze Prediction in Egocentric Videos by Energy-based Surprise Modeling
论文作者
论文摘要
随着沉浸式计算设备的出现,自我中心的感知迅速发展。人类凝视预测是分析以自我为中心视频的重要问题,主要是通过基于显着的建模或高度监督的学习来解决的。我们定量分析了对以看不见的,室外数据的以上为中心的凝视预测任务的监督,深度学习模型的概括能力。我们发现他们的性能高度取决于培训数据,并且仅限于培训注释中指定的域。在这项工作中,我们解决了共同预测人类凝视点的问题,而无需使用任何培训数据就可以对以自我为中心的视频进行时间细分。我们介绍了一种无监督的计算模型,该模型从事件感知的认知心理学模型中汲取灵感。我们使用Grenander的模式理论形式主义来代表时空特征和模型惊喜作为预测凝视固定点的机制。对两个公开可用数据集进行了广泛的评估-GTEA和GTEA+数据集 - 所提出的模型可以显着优于所有无监督的基准和一些监督的凝视预测基线。最后,我们表明该模型还可以在时间段内进行以自我为中心的视频,其性能与更复杂,完全监督的深度学习基线相当。
Egocentric perception has grown rapidly with the advent of immersive computing devices. Human gaze prediction is an important problem in analyzing egocentric videos and has primarily been tackled through either saliency-based modeling or highly supervised learning. We quantitatively analyze the generalization capabilities of supervised, deep learning models on the egocentric gaze prediction task on unseen, out-of-domain data. We find that their performance is highly dependent on the training data and is restricted to the domains specified in the training annotations. In this work, we tackle the problem of jointly predicting human gaze points and temporal segmentation of egocentric videos without using any training data. We introduce an unsupervised computational model that draws inspiration from cognitive psychology models of event perception. We use Grenander's pattern theory formalism to represent spatial-temporal features and model surprise as a mechanism to predict gaze fixation points. Extensive evaluation on two publicly available datasets - GTEA and GTEA+ datasets-shows that the proposed model can significantly outperform all unsupervised baselines and some supervised gaze prediction baselines. Finally, we show that the model can also temporally segment egocentric videos with a performance comparable to more complex, fully supervised deep learning baselines.