论文标题
人眼启发了反复的神经网络,对对抗噪音更加强大
Human Eyes Inspired Recurrent Neural Networks are More Robust Against Adversarial Noises
论文作者
论文摘要
人类通过专注于显着物体和忽略微不足道的细节来积极观察视觉环境。但是,基于卷积神经网络(CNN)的计算机视觉模型通常通过单个馈送通行证一次一次分析视觉输入。在这项研究中,我们设计了一个受人脑启发的双流视觉模型。该模型具有类似视网膜的输入层,其中包括两个流:一个确定焦点(固定点),而另一个确定固定镜头周围的视觉效果。经过图像识别训练,该模型通过一系列固定序列检查了图像,每次专注于不同的部分,从而逐步构建图像的表示。我们从对象识别,凝视行为和对抗性鲁棒性方面对各种基准进行了评估。我们的发现表明,该模型可以以类似于人类的方式参加和凝视,而不会明确地训练以模仿人类的注意力,并且由于其视网膜抽样和经常性处理,该模型可以增强针对对抗性攻击的鲁棒性。特别是,该模型可以通过更多的目光来纠正其感知错误,从而使自己与所有仅饲料前进模型区分开来。总之,视网膜采样,眼动和复发动力学的相互作用对于人类的视觉探索和推理很重要。
Humans actively observe the visual surroundings by focusing on salient objects and ignoring trivial details. However, computer vision models based on convolutional neural networks (CNN) often analyze visual input all at once through a single feed-forward pass. In this study, we designed a dual-stream vision model inspired by the human brain. This model features retina-like input layers and includes two streams: one determining the next point of focus (the fixation), while the other interprets the visuals surrounding the fixation. Trained on image recognition, this model examines an image through a sequence of fixations, each time focusing on different parts, thereby progressively building a representation of the image. We evaluated this model against various benchmarks in terms of object recognition, gaze behavior and adversarial robustness. Our findings suggest that the model can attend and gaze in ways similar to humans without being explicitly trained to mimic human attention, and that the model can enhance robustness against adversarial attacks due to its retinal sampling and recurrent processing. In particular, the model can correct its perceptual errors by taking more glances, setting itself apart from all feed-forward-only models. In conclusion, the interactions of retinal sampling, eye movement, and recurrent dynamics are important to human-like visual exploration and inference.