锥体预测网络：基于预测编码理论的视觉框架预测模型

论文标题

锥体预测网络：基于预测编码理论的视觉框架预测模型

Pyramidal Predictive Network: A Model for Visual-frame Prediction Based on Predictive Coding Theory

论文作者

Ling, Chaofan, Zhong, Junpei, Li, Weihua

论文摘要

视觉框架预测是一项像素密度的预测任务，它会渗透到过去帧中的未来帧。缺乏外观细节，低预测准确性和高计算开销仍然是当前模型或方法的主要问题。在本文中，我们提出了一个新型的神经网络模型，该模型灵感来自众所周知的预测编码理论来解决问题。预测性编码提供了一个有趣且可靠的计算框架，该框架将与其他理论（例如不同级别振荡的大脑皮层在不同频率上振荡）结合使用，以设计有效且可靠的预测网络模型以进行视觉框架预测。具体而言，该模型分别由一系列复发和卷积单元组成，分别形成自上而下和自下而上的流。该层上神经单元的更新频率随着网络水平的增加而降低，这导致高级神经元可以在更长的时间维度捕获信息。根据实验结果，该模型与现有作品显示出更好的紧凑性和可比的预测性能，这意味着较低的计算成本和较高的预测准确性。代码可在https://github.com/ling-cf/ppnet上找到。

Visual-frame prediction is a pixel-dense prediction task that infers future frames from past frames. Lacking of appearance details, low prediction accuracy and high computational overhead are still major problems with current models or methods. In this paper, we propose a novel neural network model inspired by the well-known predictive coding theory to deal with the problems. Predictive coding provides an interesting and reliable computational framework, which will be combined with other theories such as the cerebral cortex at different level oscillates at different frequencies, to design an efficient and reliable predictive network model for visual-frame prediction. Specifically, the model is composed of a series of recurrent and convolutional units forming the top-down and bottom-up streams, respectively. The update frequency of neural units on each of the layer decreases with the increasing of network levels, which results in neurons of higher-level can capture information in longer time dimensions. According to the experimental results, this model shows better compactness and comparable predictive performance with existing works, implying lower computational cost and higher prediction accuracy. Code is available at https://github.com/Ling-CF/PPNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题