论文标题
快速多视频视频综合的混合神经体素
Mixed Neural Voxels for Fast Multi-view Video Synthesis
论文作者
论文摘要
来自现实世界中多视图输入的高保真视频的综合视频是具有挑战性的,因为现实世界环境的复杂性和高度动态的动作的复杂性。以前基于神经辐射场的作品已经证明了动态场景的高质量重建。但是,在实际场景上培训此类模型通常是耗时的,通常需要几天或几周。在本文中,我们提出了一种名为Mixvoxels的新颖方法,以更好地代表具有快速训练速度和竞争性渲染品质的动态场景。所提出的混合体为静态和动态体素的混合物代表4D动态场景,并用不同的网络对其进行处理。通过这种方式,可以通过轻质模型来处理静态体素的所需方式的计算,该模型基本上减少了计算量,尤其是对于许多以静态背景为主的每日动态场景。为了区分两种体素,我们提出了一个新的变异场,以估计每个体素的时间方差。对于动态体素,我们设计了一种内部产品时间查询方法来有效查询多个时间步,这对于恢复高动力运动至关重要。结果,通过300架视频的输入的动态场景进行15分钟的培训,Mixvoxel的PSNR比以前的方法更好。可以在https://github.com/fengres/mixvoxels上获得代码和训练的模型
Synthesizing high-fidelity videos from real-world multi-view input is challenging because of the complexities of real-world environments and highly dynamic motions. Previous works based on neural radiance fields have demonstrated high-quality reconstructions of dynamic scenes. However, training such models on real-world scenes is time-consuming, usually taking days or weeks. In this paper, we present a novel method named MixVoxels to better represent the dynamic scenes with fast training speed and competitive rendering qualities. The proposed MixVoxels represents the 4D dynamic scenes as a mixture of static and dynamic voxels and processes them with different networks. In this way, the computation of the required modalities for static voxels can be processed by a lightweight model, which essentially reduces the amount of computation, especially for many daily dynamic scenes dominated by the static background. To separate the two kinds of voxels, we propose a novel variation field to estimate the temporal variance of each voxel. For the dynamic voxels, we design an inner-product time query method to efficiently query multiple time steps, which is essential to recover the high-dynamic motions. As a result, with 15 minutes of training for dynamic scenes with inputs of 300-frame videos, MixVoxels achieves better PSNR than previous methods. Codes and trained models are available at https://github.com/fengres/mixvoxels