从单眼视频中识别和3D定位行人行动

论文标题

从单眼视频中识别和3D定位行人行动

Recognition and 3D Localization of Pedestrian Actions from Monocular Video

论文作者

Hayakawa, Jun, Dariush, Behzad

论文摘要

理解和预测行人行为是研究城市场景中自动化和高级驾驶援助技术的安全有效导航策略的重要研究领域。本文重点介绍了以自我为中心的观点，以预测意图和预测未来轨迹的目的。在城市交通场景中解决这个问题的挑战归因于行人的不可预测行为，在这种行为中，行动和意图一直在不断变化，依赖行人姿势，他们的3D空间关系以及与其他机构以及与环境的互动。为了部分解决这些挑战，我们考虑了姿势对识别行人行动的识别和3D定位的重要性。特别是，我们建议使用两个传播行人的原始RGB图像序列以及行人姿势的输入的两流时间关系网络提出一个动作识别框架。所提出的方法基于使用JAAD公共数据集的评估，使用单流时间关系网络优于方法。估计的姿势和相关的身体密钥点也被用作网络的输入，该网络使用唯一的损失函数估算行人的3D位置。在KITTI数据集上对我们的3D本地化方法的评估表明，与现有最新方法相比，平均定位误差的改善。最后，我们对HRI的H3D驾驶数据集进行了定性的行动识别和3D定位测试。

Understanding and predicting pedestrian behavior is an important and challenging area of research for realizing safe and effective navigation strategies in automated and advanced driver assistance technologies in urban scenes. This paper focuses on monocular pedestrian action recognition and 3D localization from an egocentric view for the purpose of predicting intention and forecasting future trajectory. A challenge in addressing this problem in urban traffic scenes is attributed to the unpredictable behavior of pedestrians, whereby actions and intentions are constantly in flux and depend on the pedestrians pose, their 3D spatial relations, and their interaction with other agents as well as with the environment. To partially address these challenges, we consider the importance of pose toward recognition and 3D localization of pedestrian actions. In particular, we propose an action recognition framework using a two-stream temporal relation network with inputs corresponding to the raw RGB image sequence of the tracked pedestrian as well as the pedestrian pose. The proposed method outperforms methods using a single-stream temporal relation network based on evaluations using the JAAD public dataset. The estimated pose and associated body key-points are also used as input to a network that estimates the 3D location of the pedestrian using a unique loss function. The evaluation of our 3D localization method on the KITTI dataset indicates the improvement of the average localization error as compared to existing state-of-the-art methods. Finally, we conduct qualitative tests of action recognition and 3D localization on HRI's H3D driving dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题