视频中的3D姿势检测：关注遮挡

论文标题

视频中的3D姿势检测：关注遮挡

3D Pose Detection in Videos: Focusing on Occlusion

论文作者

Wang, Justin, Xu, Edward, Xue, Kangrui, Kidzinski, Lukasz

论文摘要

在这项工作中，我们基于视频中咬合感知3D姿势检测的现有方法。我们实施了由堆叠的沙漏网络组成的两个阶段体系结构，以产生2D姿势预测，然后将其输入到时间卷积网络中以产生3D姿势预测。为了促进用遮挡关节的姿势预测，我们引入了用于生成遮挡标签的圆柱体模型的直观概括。我们发现，遮挡感知网络能够达到比人类360万数据集中的线性基线模型少5 mm的平均值误差。与我们的时间卷积网络基线相比，我们在降低的计算成本下达到了可比的每个接头位置误差的0.1 mm。

In this work, we build upon existing methods for occlusion-aware 3D pose detection in videos. We implement a two stage architecture that consists of the stacked hourglass network to produce 2D pose predictions, which are then inputted into a temporal convolutional network to produce 3D pose predictions. To facilitate prediction on poses with occluded joints, we introduce an intuitive generalization of the cylinder man model used to generate occlusion labels. We find that the occlusion-aware network is able to achieve a mean-per-joint-position error 5 mm less than our linear baseline model on the Human3.6M dataset. Compared to our temporal convolutional network baseline, we achieve a comparable mean-per-joint-position error of 0.1 mm less at reduced computational cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题