论文标题
实施强化学习的尖峰神经网络结构
A Spiking Neural Network Structure Implementing Reinforcement Learning
论文作者
论文摘要
目前,尽管提出了大量的SNN学习算法,但在峰值神经网络(SNN)中的学习机制(SNN)的实施不能被视为解决科学问题。 SNN实施增强学习(RL)也是如此,而RL对SNN尤为重要,因为从SNN应用的角度来看,它与诸如机器人技术的观点与最有前途的域保持了密切的关系。在本文中,我描述了一个SNN结构,该结构似乎可以在各种RL任务中使用。我方法的独特特征是仅使用所有涉及的所有信号的尖峰形式 - 感官输入流,发送给执行器的输出信号以及奖励/惩罚信号。除此之外,选择神经元/可塑性模型,我的指导下,我应该在现代神经芯片上轻松实施它们。本文中考虑的SNN结构包括通过LIFAT的概括(具有自适应阈值的泄漏整合性和开火神经元)模型描述的尖峰神经元,以及一个简单的峰值定时突触可塑性模型(多巴胺调节的可塑性的概括)。我的概念基于对RL任务特征的非常一般的假设,并且对其适用性没有明显的限制。为了测试它,我选择了一个简单但非平凡的任务,以训练网络,以使模拟DVS摄像机的视图中保持混乱的光点。描述的SNN成功解决了这个RL问题,可以被视为支持我方法效率的证据。
At present, implementation of learning mechanisms in spiking neural networks (SNN) cannot be considered as a solved scientific problem despite plenty of SNN learning algorithms proposed. It is also true for SNN implementation of reinforcement learning (RL), while RL is especially important for SNNs because of its close relationship to the domains most promising from the viewpoint of SNN application such as robotics. In the present paper, I describe an SNN structure which, seemingly, can be used in wide range of RL tasks. The distinctive feature of my approach is usage of only the spike forms of all signals involved - sensory input streams, output signals sent to actuators and reward/punishment signals. Besides that, selecting the neuron/plasticity models, I was guided by the requirement that they should be easily implemented on modern neurochips. The SNN structure considered in the paper includes spiking neurons described by a generalization of the LIFAT (leaky integrate-and-fire neuron with adaptive threshold) model and a simple spike timing dependent synaptic plasticity model (a generalization of dopamine-modulated plasticity). My concept is based on very general assumptions about RL task characteristics and has no visible limitations on its applicability. To test it, I selected a simple but non-trivial task of training the network to keep a chaotically moving light spot in the view field of an emulated DVS camera. Successful solution of this RL problem by the SNN described can be considered as evidence in favor of efficiency of my approach.