带基于模型的先验的一弹性隐式动画化身

论文标题

带基于模型的先验的一弹性隐式动画化身

One-shot Implicit Animatable Avatars with Model-based Priors

论文作者

Huang, Yangyi, Yi, Hongwei, Liu, Weiyang, Wang, Haofan, Wu, Boxi, Wang, Wenxiao, Lin, Binbin, Zhang, Debing, Cai, Deng

论文摘要

现有的神经渲染方法用于创建人类化身通常需要密集的输入信号，例如视频或多视图图像，或者从大规模的特定3D人类数据集中利用先验的先验，以便可以使用稀疏视图输入进行重建。当只有一个图像可用时，这些方法中的大多数无法实现现实的重建。为了启用可逼真的动画3D人类的数据有效创建，我们提出了一种从单个图像中学习人类特异性神经辐射领域的新方法。受到人类可以毫不费力地估算身体几何形状并想象一下单个图像的全身服装的事实的启发，我们利用了两个先验：3D几何和视觉语义先验。具体而言，Initic利用基于皮肤顶点的模板模型（即SMPL）的3D体形几何形状，并使用基于夹子的预审计的模型实现了视觉服装语义。两个先生均用于共同指导优化，以在看不见的区域创建合理的内容。利用剪辑模型，INITIC可以使用文本描述来生成文本条件的看不见区域。为了进一步改善视觉细节，我们提出了一种基于细分的采样策略，该策略在局部完善了化身的不同部分。对包括ZJU-mocap，Human 36M和DeepFashion在内的多个流行基准测试的全面评估表明，当只有一个图像可用时，Initic的表现优于强大的Avatar创建方法。该守则是在https://huangyangyi.github.io/elicit/上公开的。

Existing neural rendering methods for creating human avatars typically either require dense input signals such as video or multi-view images, or leverage a learned prior from large-scale specific 3D human datasets such that reconstruction can be performed with sparse-view inputs. Most of these methods fail to achieve realistic reconstruction when only a single image is available. To enable the data-efficient creation of realistic animatable 3D humans, we propose ELICIT, a novel method for learning human-specific neural radiance fields from a single image. Inspired by the fact that humans can effortlessly estimate the body geometry and imagine full-body clothing from a single image, we leverage two priors in ELICIT: 3D geometry prior and visual semantic prior. Specifically, ELICIT utilizes the 3D body shape geometry prior from a skinned vertex-based template model (i.e., SMPL) and implements the visual clothing semantic prior with the CLIP-based pretrained models. Both priors are used to jointly guide the optimization for creating plausible content in the invisible areas. Taking advantage of the CLIP models, ELICIT can use text descriptions to generate text-conditioned unseen regions. In order to further improve visual details, we propose a segmentation-based sampling strategy that locally refines different parts of the avatar. Comprehensive evaluations on multiple popular benchmarks, including ZJU-MoCAP, Human3.6M, and DeepFashion, show that ELICIT has outperformed strong baseline methods of avatar creation when only a single image is available. The code is public for research purposes at https://huangyangyi.github.io/ELICIT/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题