生成模型的可靠忠诚度和多样性指标

论文标题

生成模型的可靠忠诚度和多样性指标

Reliable Fidelity and Diversity Metrics for Generative Models

论文作者

Naeem, Muhammad Ferjad, Oh, Seong Joon, Uh, Youngjung, Choi, Yunjey, Yoo, Jaejun

论文摘要

设计图像生成任务的指示性评估指标仍然是一个开放的问题。用于测量真实图像和生成图像之间相似性的最广泛使用的度量是Fréchet成立距离（FID）得分。由于它没有区分生成图像的保真度和多样性方面，因此最近的论文引入了精确度和召回指标的变体，以分别诊断这些属性。在本文中，我们表明，即使是最新版本的精确度和召回指标也不可靠。例如，他们无法检测到两个相同的分布之间的匹配，它们对异常值并不强大，并且任意选择评估超级法。我们提出了解决上述问题的密度和覆盖率指标。我们通过分析和实验表明，与现有指标相比，密度和覆盖范围为从业者提供了更多的可解释和可靠的信号。代码：https：//github.com/clovaai/generative-evaluation-prdc。

Devising indicative evaluation metrics for the image generation task remains an open problem. The most widely used metric for measuring the similarity between real and generated images has been the Fréchet Inception Distance (FID) score. Because it does not differentiate the fidelity and diversity aspects of the generated images, recent papers have introduced variants of precision and recall metrics to diagnose those properties separately. In this paper, we show that even the latest version of the precision and recall metrics are not reliable yet. For example, they fail to detect the match between two identical distributions, they are not robust against outliers, and the evaluation hyperparameters are selected arbitrarily. We propose density and coverage metrics that solve the above issues. We analytically and experimentally show that density and coverage provide more interpretable and reliable signals for practitioners than the existing metrics. Code: https://github.com/clovaai/generative-evaluation-prdc.

下载PDF全文

下载文献需遵守相关版权规定

论文标题