在视频中进行3D人姿势估计的图形注意时空卷积网络

论文标题

在视频中进行3D人姿势估计的图形注意时空卷积网络

A Graph Attention Spatio-temporal Convolutional Network for 3D Human Pose Estimation in Video

论文作者

Liu, Junfa, Rojas, Juan, Liang, Zhijun, Li, Yihui, Guan, Yisheng

论文摘要

时空信息是解决3D姿势估计中闭塞和深度歧义的关键。先前的方法集中在嵌入固定长度时空信息的时间上下文或局部到全球架构上。迄今为止，尚无有效的提案，可以同时且灵活地捕获不同的时空序列，并有效地实现实时3D姿势估计。在这项工作中，我们通过通过注意机制对局部和全球空间信息进行建模，改善了人类骨骼中运动学约束的学习：姿势，局部运动联系和对称性。为了适应单帧和多帧估计，使用扩张的时间模型来处理变化的骨骼序列。同样，重要的是，我们仔细设计了具有时间依赖性的空间语义的交织，以实现协同作用。为此，我们提出了一个简单而有效的图形关注时空卷积网络（GAST-NET），该网络包括交错的时间卷积和图形注意块。对两个具有挑战性的基准数据集（Human36M和Humaneva-I）和YouTube视频进行的实验表明，我们的方法有效地降低了深度的歧义和自我估计，可以推广到一半的上半身估计，并在2D到3D视频效果估计上实现竞争性能。代码，视频和补充信息可在以下网址获得：\ href {http://www.juanrojas.net/gast/} {http://www.juanrojas.net/gast/gast/}

Spatio-temporal information is key to resolve occlusion and depth ambiguity in 3D pose estimation. Previous methods have focused on either temporal contexts or local-to-global architectures that embed fixed-length spatio-temporal information. To date, there have not been effective proposals to simultaneously and flexibly capture varying spatio-temporal sequences and effectively achieves real-time 3D pose estimation. In this work, we improve the learning of kinematic constraints in the human skeleton: posture, local kinematic connections, and symmetry by modeling local and global spatial information via attention mechanisms. To adapt to single- and multi-frame estimation, the dilated temporal model is employed to process varying skeleton sequences. Also, importantly, we carefully design the interleaving of spatial semantics with temporal dependencies to achieve a synergistic effect. To this end, we propose a simple yet effective graph attention spatio-temporal convolutional network (GAST-Net) that comprises of interleaved temporal convolutional and graph attention blocks. Experiments on two challenging benchmark datasets (Human3.6M and HumanEva-I) and YouTube videos demonstrate that our approach effectively mitigates depth ambiguity and self-occlusion, generalizes to half upper body estimation, and achieves competitive performance on 2D-to-3D video pose estimation. Code, video, and supplementary information is available at: \href{http://www.juanrojas.net/gast/}{http://www.juanrojas.net/gast/}

下载PDF全文

下载文献需遵守相关版权规定

论文标题