vectornet：编码来自矢量表示的高清图和代理动力学

论文标题

vectornet：编码来自矢量表示的高清图和代理动力学

VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation

论文作者

Gao, Jiyang, Sun, Chen, Zhao, Hang, Shen, Yi, Anguelov, Dragomir, Li, Congcong, Schmid, Cordelia

论文摘要

由于道路组件的复杂表示和相互作用，包括移动代理（例如行人和车辆）和道路上下文信息（例如车道，交通信号灯），在自动驾驶汽车的背景下，动态，多机构系统的行为预测是一个重要的问题。本文介绍了Vectornet，Vectornet是一个层次图神经网络，该网络首先利用由矢量代表的单个道路组件的空间位置，然后对所有组件之间的高阶相互作用进行建模。与最近的方法相反，这些方法将移动试剂和道路上下文信息的轨迹作为鸟眼图像，并用卷积神经网络（Convnets）对其进行编码，我们的方法在矢量表示上运行。通过在矢量化的高清图（HD）地图和代理轨迹上操作，我们避免了损失的渲染和计算密集的Convnet编码步骤。为了进一步提高Vectornet在学习上下文特征中的能力，我们提出了一项新颖的辅助任务，以根据其上下文恢复随机掩盖的地图实体和代理轨迹。我们在内部行为预测基准和最近发布的Argoverse预测数据集上评估Vectornet。我们的方法在两个基准上的竞争性渲染方法上取得了比值或更好的性能，同时节省了70％的模型参数，而拖鞋的数量级降低了。它还在Argoverse数据集上胜过最新技术的状态。

Behavior prediction in dynamic, multi-agent systems is an important problem in the context of self-driving cars, due to the complex representations and interactions of road components, including moving agents (e.g. pedestrians and vehicles) and road context information (e.g. lanes, traffic lights). This paper introduces VectorNet, a hierarchical graph neural network that first exploits the spatial locality of individual road components represented by vectors and then models the high-order interactions among all components. In contrast to most recent approaches, which render trajectories of moving agents and road context information as bird-eye images and encode them with convolutional neural networks (ConvNets), our approach operates on a vector representation. By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps. To further boost VectorNet's capability in learning context features, we propose a novel auxiliary task to recover the randomly masked out map entities and agent trajectories based on their context. We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset. Our method achieves on par or better performance than the competitive rendering approach on both benchmarks while saving over 70% of the model parameters with an order of magnitude reduction in FLOPs. It also outperforms the state of the art on the Argoverse dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题