vidm：视频隐式扩散模型

论文标题

vidm：视频隐式扩散模型

VIDM: Video Implicit Diffusion Models

论文作者

Mei, Kangfu, Patel, Vishal M.

论文摘要

扩散模型已成为合成高质量和多样化图像的强大生成方法。在本文中，我们提出了一种基于扩散模型的视频生成方法，其中运动的效果以隐式条件方式建模，即，可以根据帧的潜在特征来采样合理的视频动作。我们通过提出多种策略（例如抽样空间截断，稳健性惩罚和位置组规范化）来提高生成视频的质量。在数据集上进行了各种实验，该数据集由具有不同分辨率和不同框架数量的视频组成。结果表明，所提出的方法的表现优于最先进的基于对抗网络的方法，而基于FVD的分数和可感知的视觉质量的差距很大。

Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images. In this paper, we propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition manner, i.e. one can sample plausible video motions according to the latent feature of frames. We improve the quality of the generated videos by proposing multiple strategies such as sampling space truncation, robustness penalty, and positional group normalization. Various experiments are conducted on datasets consisting of videos with different resolutions and different number of frames. Results show that the proposed method outperforms the state-of-the-art generative adversarial network-based methods by a significant margin in terms of FVD scores as well as perceptible visual quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题