大孔图像完成的结构引导的扩散模型

论文标题

大孔图像完成的结构引导的扩散模型

A Structure-Guided Diffusion Model for Large-Hole Image Completion

论文作者

Horita, Daichi, Yang, Jiaolong, Chen, Dong, Koyama, Yuki, Aizawa, Kiyoharu, Sebe, Nicu

论文摘要

图像完成技术在图像中填充缺失区域（即孔）方面取得了重大进展。但是，由于结构性信息有限，大洞的完成仍然具有挑战性。在本文中，我们通过将明确的结构指导整合到基于扩散的图像完成中来解决此问题，从而形成了我们的结构引导扩散模型（SGDM）。它由两个级联扩散概率模型组成：结构和纹理发生器。结构生成器生成了一个边缘图像，代表孔中的合理结构，然后将其用于引导纹理生成过程。为了共同训练两个发电机，我们设计了一种新型策略，利用最佳的贝叶斯脱诺化，该策略将结构发电机的输出置于一个步骤中，从而允许反向传播。我们基于扩散的方法可实现多种合理的完成，而可编辑的边缘则允许编辑图像的部分。我们在自然场景（位置）和Face（Celeba-HQ）数据集上的实验表明，与最新方法相比，我们的方法具有优异或可比的视觉质量。该代码可在https://github.com/udonda/structure_guided_diffusion_model上用于研究目的。

Image completion techniques have made significant progress in filling missing regions (i.e., holes) in images. However, large-hole completion remains challenging due to limited structural information. In this paper, we address this problem by integrating explicit structural guidance into diffusion-based image completion, forming our structure-guided diffusion model (SGDM). It consists of two cascaded diffusion probabilistic models: structure and texture generators. The structure generator generates an edge image representing plausible structures within the holes, which is then used for guiding the texture generation process. To train both generators jointly, we devise a novel strategy that leverages optimal Bayesian denoising, which denoises the output of the structure generator in a single step and thus allows backpropagation. Our diffusion-based approach enables a diversity of plausible completions, while the editable edges allow for editing parts of an image. Our experiments on natural scene (Places) and face (CelebA-HQ) datasets demonstrate that our method achieves a superior or comparable visual quality compared to state-of-the-art approaches. The code is available for research purposes at https://github.com/UdonDa/Structure_Guided_Diffusion_Model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题