我在寻找什么：视觉搜索中的零射击目标身份推断

论文标题

我在寻找什么：视觉搜索中的零射击目标身份推断

What am I Searching for: Zero-shot Target Identity Inference in Visual Search

论文作者

Zhang, Mengmi, Kreiman, Gabriel

论文摘要

我们可以从一个人的行为中推断出意图吗？作为一个例子的问题，我们在这里考虑如何通过解码其眼动行为来解读一个人正在寻找的东西。我们进行了两个心理物理学实验，在该实验中，我们在受试者搜索目标对象的同时监视眼动。我们将落在\ textit {non-target}对象上的固定为“错误固定”。使用这些错误固定，我们开发了一个模型（地狱）来推断目标是什么。地狱使用预训练的卷积神经网络从误差固定中提取特征，并在搜索图像上计算错误固定和所有位置之间的相似性图。该模型整合了跨层的相似性图，并在所有误差固定范围内集成了这些地图。地狱成功确定了对象的目标，即使没有针对推理任务的任何特定于对象的培训，也要超越竞争性零模型。

Can we infer intentions from a person's actions? As an example problem, here we consider how to decipher what a person is searching for by decoding their eye movement behavior. We conducted two psychophysics experiments where we monitored eye movements while subjects searched for a target object. We defined the fixations falling on \textit{non-target} objects as "error fixations". Using those error fixations, we developed a model (InferNet) to infer what the target was. InferNet uses a pre-trained convolutional neural network to extract features from the error fixations and computes a similarity map between the error fixations and all locations across the search image. The model consolidates the similarity maps across layers and integrates these maps across all error fixations. InferNet successfully identifies the subject's goal and outperforms competitive null models, even without any object-specific training on the inference task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题