Chimle：用于多模式条件图像合成的条件层次IMLE

论文标题

Chimle：用于多模式条件图像合成的条件层次IMLE

CHIMLE: Conditional Hierarchical IMLE for Multimodal Conditional Image Synthesis

论文作者

Peng, Shichong, Moazeni, Alireza, Li, Ke

论文摘要

条件图像合成中的持续挑战是，尽管每个输入图像只观察到一个输出图像，但仍可从相同的输入图像中生成各种输出图像。基于GAN的方法容易崩溃，这导致多样性低。为了解决这个问题，我们利用隐式最大似然估计（IMLE），可以从根本上克服模式崩溃。 imle使用与甘斯相同的发电机，但用不同的非对抗性物镜训练它，该物镜可确保每个观察到的图像在附近都有生成的样品。不幸的是，要产生高保真图像，先前基于IMLE的方法需要大量样本，这很昂贵。在本文中，我们提出了一种新的方法来解决此限制，我们将其配置为条件层次IMLE（Chimle），该层次可以生成高保真图像而无需许多样本。我们显示的Chimle明显优于先前的最佳IMLE，GAN和基于扩散的方法在四个任务上的图像保真度和模式覆盖范围方面，即夜间的16倍单图像超级分辨率，图像色彩和图像减压。与先前的最佳基于IMLE的方法相比，我们的方法平均将Fréchet的成立距离（FID）平均提高了36.9％，与最佳基于非IMLE的最佳通用通用方法相比，平均将FRéchet距离（FID）提高了27.5％。

A persistent challenge in conditional image synthesis has been to generate diverse output images from the same input image despite only one output image being observed per input image. GAN-based methods are prone to mode collapse, which leads to low diversity. To get around this, we leverage Implicit Maximum Likelihood Estimation (IMLE) which can overcome mode collapse fundamentally. IMLE uses the same generator as GANs but trains it with a different, non-adversarial objective which ensures each observed image has a generated sample nearby. Unfortunately, to generate high-fidelity images, prior IMLE-based methods require a large number of samples, which is expensive. In this paper, we propose a new method to get around this limitation, which we dub Conditional Hierarchical IMLE (CHIMLE), which can generate high-fidelity images without requiring many samples. We show CHIMLE significantly outperforms the prior best IMLE, GAN and diffusion-based methods in terms of image fidelity and mode coverage across four tasks, namely night-to-day, 16x single image super-resolution, image colourization and image decompression. Quantitatively, our method improves Fréchet Inception Distance (FID) by 36.9% on average compared to the prior best IMLE-based method, and by 27.5% on average compared to the best non-IMLE-based general-purpose methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题