论文标题
球形卷积授权的FOV预测在360度视频中具有有限的FOV反馈
Spherical Convolution empowered FoV Prediction in 360-degree Video Multicast with Limited FoV Feedback
论文作者
论文摘要
在360度视频多播中,视野(FOV)预测至关重要,这是新兴虚拟现实(VR)和增强现实(AR)应用的关键组成部分。当前的大多数预测方法结合了显着性检测和FOV信息,都没有考虑到预计的360度视频的扭曲可能会使传统卷积网络的重量共享无效,也没有充分考虑获得获得完整的多用户FOV信息的困难,从而降低了预测性能。本文提出了一种由球形卷积授权的FOV预测方法,该方法是一个多源预测框架,结合了从360度视频中提取的显着特征以及有限的FOV反馈信息。使用球形卷积神经网络(CNN)代替传统的二维CNN来消除视频投影失真引起的重量共享失败问题。具体而言,显着的时空特征是通过基于球形卷积的显着性检测模型提取的,此后,有限的反馈FOV信息表示为基于球形卷积授权的封闭式复发单元网络的时间序列模型。最后,将提取的显着视频功能组合在一起,以预测未来的用户FOV。实验结果表明,所提出的方法的性能优于其他预测方法。
Field of view (FoV) prediction is critical in 360-degree video multicast, which is a key component of the emerging Virtual Reality (VR) and Augmented Reality (AR) applications. Most of the current prediction methods combining saliency detection and FoV information neither take into account that the distortion of projected 360-degree videos can invalidate the weight sharing of traditional convolutional networks, nor do they adequately consider the difficulty of obtaining complete multi-user FoV information, which degrades the prediction performance. This paper proposes a spherical convolution-empowered FoV prediction method, which is a multi-source prediction framework combining salient features extracted from 360-degree video with limited FoV feedback information. A spherical convolution neural network (CNN) is used instead of a traditional two-dimensional CNN to eliminate the problem of weight sharing failure caused by video projection distortion. Specifically, salient spatial-temporal features are extracted through a spherical convolution-based saliency detection model, after which the limited feedback FoV information is represented as a time-series model based on a spherical convolution-empowered gated recurrent unit network. Finally, the extracted salient video features are combined to predict future user FoVs. The experimental results show that the performance of the proposed method is better than other prediction methods.