场景意识到人图像通过全球上下文调节生成

论文标题

场景意识到人图像通过全球上下文调节生成

Scene Aware Person Image Generation through Global Contextual Conditioning

论文作者

Roy, Prasun, Ghosh, Subhankar, Bhattacharya, Saumik, Pal, Umapada, Blumenstein, Michael

论文摘要

人形象生成是一个有趣而又具有挑战性的问题。但是，在有限的情况下，此任务变得更加困难。在这项工作中，我们提出了一条新颖的管道，以在保留全球语义的同时，将上下文相关的人形象生成和插入上下文相关的人图像。更具体地说，我们旨在插入一个人，以使所插入的人的位置，姿势和规模与现有人融合在一起。我们的方法在顺序管道中使用三个单独的网络。首先，我们通过在现有的现有人类骨架上调节Wasserstein生成对抗网络（WGAN），预测新人的潜在位置和骨骼结构。接下来，通过浅线性网络进行了预测的骨骼，以在生成的图像中获得更高的结构准确性。最后，使用另一个在目标人的给定图像的生成网络从精制骨架中生成目标图像。在我们的实验中，我们在保留场景的一般环境的同时，取得了高分辨率的照片真实生成结果。我们在结果上以多种定性和定量基准结束了论文。

Person image generation is an intriguing yet challenging problem. However, this task becomes even more difficult under constrained situations. In this work, we propose a novel pipeline to generate and insert contextually relevant person images into an existing scene while preserving the global semantics. More specifically, we aim to insert a person such that the location, pose, and scale of the person being inserted blends in with the existing persons in the scene. Our method uses three individual networks in a sequential pipeline. At first, we predict the potential location and the skeletal structure of the new person by conditioning a Wasserstein Generative Adversarial Network (WGAN) on the existing human skeletons present in the scene. Next, the predicted skeleton is refined through a shallow linear network to achieve higher structural accuracy in the generated image. Finally, the target image is generated from the refined skeleton using another generative network conditioned on a given image of the target person. In our experiments, we achieve high-resolution photo-realistic generation results while preserving the general context of the scene. We conclude our paper with multiple qualitative and quantitative benchmarks on the results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题