多视图匹配（MVM）：用动作冻结的人视频促进多人3D姿势估计学习

论文标题

多视图匹配（MVM）：用动作冻结的人视频促进多人3D姿势估计学习

Multi-View Matching (MVM): Facilitating Multi-Person 3D Pose Estimation Learning with Action-Frozen People Video

论文作者

Shen, Yeji, Kuo, C. -C. Jay

论文摘要

为了解决单个图像的多人3D姿势估计的挑剔问题，我们在这项工作中提出了一种多视图匹配（MVM）方法。 MVM方法生成可靠的3D人类摆姿势，它来自一个称为Mannequin数据集的大型视频数据集，其中包含动作范围的人，使人体模样降低。通过大量由MVM自动生成的3D Supperions标记的野外视频数据，我们能够训练一个神经网络，该神经网络将单个图像作为多人3D姿势估计的输入。 MVM的核心技术在于从具有强大几何约束的静态场景的多个视图中获得的2D姿势的有效对齐。我们的目标是最大化在多个帧中估计的2D姿势的相互一致性，同时考虑了几何约束和外观相似性。为了证明MVM方法提供的3D监督的有效性，我们对3DPW和MSCOCO数据集进行了实验，并表明我们提出的解决方案提供了最新的性能。

To tackle the challeging problem of multi-person 3D pose estimation from a single image, we propose a multi-view matching (MVM) method in this work. The MVM method generates reliable 3D human poses from a large-scale video dataset, called the Mannequin dataset, that contains action-frozen people immitating mannequins. With a large amount of in-the-wild video data labeled by 3D supervisions automatically generated by MVM, we are able to train a neural network that takes a single image as the input for multi-person 3D pose estimation. The core technology of MVM lies in effective alignment of 2D poses obtained from multiple views of a static scene that has a strong geometric constraint. Our objective is to maximize mutual consistency of 2D poses estimated in multiple frames, where geometric constraints as well as appearance similarities are taken into account simultaneously. To demonstrate the effectiveness of 3D supervisions provided by the MVM method, we conduct experiments on the 3DPW and the MSCOCO datasets and show that our proposed solution offers the state-of-the-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题