AI Illustrator：将原始描述转换为图像，通过及时的基于跨模式生成

论文标题

AI Illustrator：将原始描述转换为图像，通过及时的基于跨模式生成

AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation

论文作者

Ma, Yiyang, Yang, Huan, Liu, Bei, Fu, Jianlong, Liu, Jiaying

论文摘要

AI Illustrator的目标是自动设计具有视觉吸引力的图像，以激发丰富的思想和情感。为了实现这一目标，我们提出了一个框架，将具有复杂语义的原始描述转换为语义相应的图像。主要的挑战在于原始描述语义的复杂性，这可能很难可视化（例如，“阴暗”或“亚洲”）。它通常对现有方法构成挑战，以处理此类描述。为了解决这个问题，我们提出了一个基于及时的跨模式生成框架（PCM-FRAME），以利用两个强大的预训练模型，包括剪辑和StyleGAN。我们的框架由两个组成部分组成：一个基于提示的图像嵌入到图像嵌入的投影模块，以及基于stylegan建立的改编的图像生成模块，该模块将图像嵌入为输入，并通过组合的语义一致性损失进行了训练。为了弥合逼真的图像和插图设计之间的差距，我们进一步采用了风格化模型作为后处理，以获得更好的视觉效果。受益于预先训练的模型，我们的方法可以处理复杂的描述，并且不需要外部配对数据进行培训。此外，我们已经建立了一个由200个原始描述组成的基准。我们进行了一项用户研究，以证明我们对复杂文本的竞争方法的优势。我们在https://github.com/researchmm/ai_illustrator上发布代码。

AI illustrator aims to automatically design visually appealing images for books to provoke rich thoughts and emotions. To achieve this goal, we propose a framework for translating raw descriptions with complex semantics into semantically corresponding images. The main challenge lies in the complexity of the semantics of raw descriptions, which may be hard to be visualized (e.g., "gloomy" or "Asian"). It usually poses challenges for existing methods to handle such descriptions. To address this issue, we propose a Prompt-based Cross-Modal Generation Framework (PCM-Frame) to leverage two powerful pre-trained models, including CLIP and StyleGAN. Our framework consists of two components: a projection module from Text Embeddings to Image Embeddings based on prompts, and an adapted image generation module built on StyleGAN which takes Image Embeddings as inputs and is trained by combined semantic consistency losses. To bridge the gap between realistic images and illustration designs, we further adopt a stylization model as post-processing in our framework for better visual effects. Benefiting from the pre-trained models, our method can handle complex descriptions and does not require external paired data for training. Furthermore, we have built a benchmark that consists of 200 raw descriptions. We conduct a user study to demonstrate our superiority over the competing methods with complicated texts. We release our code at https://github.com/researchmm/AI_Illustrator.

下载PDF全文

下载文献需遵守相关版权规定

论文标题