论文标题
超越高斯分布的预测编码
Predictive Coding beyond Gaussian Distributions
论文作者
论文摘要
最近的大量研究具有深远的目标,即为深层神经网络寻找训练方法,这些方法可以作为反向传播的替代方案(BP)。一个突出的例子是预测性编码(PC),它是一种神经科学启发的方法,可对分层高斯生成模型进行推断。但是,这些方法无法跟上现代神经网络,因为它们无法复制复杂层和激活功能的动态。在这项工作中,我们通过将PC推广到任意概率分布,从而实现诸如变形金刚之类的架构的训练来解决此问题,而这些构建结构仅在高斯假设上很难近似。我们进行三个实验分析。首先,我们在多个玩具示例上研究了我们的方法与PC的标准配方之间的差距。其次,我们测试了变异自动编码器上的重建质量,我们的方法达到与BP相同的重建质量。第三,我们证明我们的方法使我们能够训练变压器网络并实现与有条件语言模型的BP相当的性能。更广泛地说,这种方法允许将神经科学启发的学习应用于多个域,因为可以灵活地适应所使用的数据,任务和体系结构。
A large amount of recent research has the far-reaching goal of finding training methods for deep neural networks that can serve as alternatives to backpropagation (BP). A prominent example is predictive coding (PC), which is a neuroscience-inspired method that performs inference on hierarchical Gaussian generative models. These methods, however, fail to keep up with modern neural networks, as they are unable to replicate the dynamics of complex layers and activation functions. In this work, we solve this problem by generalizing PC to arbitrary probability distributions, enabling the training of architectures, such as transformers, that are hard to approximate with only Gaussian assumptions. We perform three experimental analyses. First, we study the gap between our method and the standard formulation of PC on multiple toy examples. Second, we test the reconstruction quality on variational autoencoders, where our method reaches the same reconstruction quality as BP. Third, we show that our method allows us to train transformer networks and achieve a performance comparable with BP on conditional language models. More broadly, this method allows neuroscience-inspired learning to be applied to multiple domains, since the internal distributions can be flexibly adapted to the data, tasks, and architectures used.