论文标题
vidm:视频隐式扩散模型
VIDM: Video Implicit Diffusion Models
论文作者
论文摘要
扩散模型已成为合成高质量和多样化图像的强大生成方法。在本文中,我们提出了一种基于扩散模型的视频生成方法,其中运动的效果以隐式条件方式建模,即,可以根据帧的潜在特征来采样合理的视频动作。我们通过提出多种策略(例如抽样空间截断,稳健性惩罚和位置组规范化)来提高生成视频的质量。在数据集上进行了各种实验,该数据集由具有不同分辨率和不同框架数量的视频组成。结果表明,所提出的方法的表现优于最先进的基于对抗网络的方法,而基于FVD的分数和可感知的视觉质量的差距很大。
Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images. In this paper, we propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition manner, i.e. one can sample plausible video motions according to the latent feature of frames. We improve the quality of the generated videos by proposing multiple strategies such as sampling space truncation, robustness penalty, and positional group normalization. Various experiments are conducted on datasets consisting of videos with different resolutions and different number of frames. Results show that the proposed method outperforms the state-of-the-art generative adversarial network-based methods by a significant margin in terms of FVD scores as well as perceptible visual quality.