论文标题
被视线数据作为注释的弱监督的对象检测
Weakly Supervised Attended Object Detection Using Gaze Data as Annotations
论文作者
论文摘要
我们考虑了从以自我为中心的视力中检测和认识到访客(即参加的物体)观察到的对象的问题。该问题的标准方法涉及检测所有对象,并选择通过注视跟踪器测量的访客目光最佳重叠的对象。由于在成本和时间方面标记大量数据以训练标准对象检测器的价格昂贵,因此我们提出了一个弱监督的任务版本,该版本仅依靠凝视数据和框架级别标签,指示了所在的对象类。为了研究问题,我们提出了一个新的数据集,该数据集由访问博物馆的主题的以自我为中心的视频和凝视坐标组成。因此,我们比较了三个不同的基准,用于对收集的数据进行弱监督的对象检测。结果表明,所考虑的方法以弱监督的方式实现令人满意的性能,从而可以基于更快的R-CNN来节省大量时间。为了鼓励有关该主题的研究,我们在以下URL上公开发布代码和数据集:https://iplab.dmi.unict.it/ws_obj_det/
We consider the problem of detecting and recognizing the objects observed by visitors (i.e., attended objects) in cultural sites from egocentric vision. A standard approach to the problem involves detecting all objects and selecting the one which best overlaps with the gaze of the visitor, measured through a gaze tracker. Since labeling large amounts of data to train a standard object detector is expensive in terms of costs and time, we propose a weakly supervised version of the task which leans only on gaze data and a frame-level label indicating the class of the attended object. To study the problem, we present a new dataset composed of egocentric videos and gaze coordinates of subjects visiting a museum. We hence compare three different baselines for weakly supervised attended object detection on the collected data. Results show that the considered approaches achieve satisfactory performance in a weakly supervised manner, which allows for significant time savings with respect to a fully supervised detector based on Faster R-CNN. To encourage research on the topic, we publicly release the code and the dataset at the following url: https://iplab.dmi.unict.it/WS_OBJ_DET/