mvlidarnet：使用多个视图的实时多级场景理解用于自动驾驶

论文标题

mvlidarnet：使用多个视图的实时多级场景理解用于自动驾驶

MVLidarNet: Real-Time Multi-Class Scene Understanding for Autonomous Driving Using Multiple Views

论文作者

Chen, Ke, Oldja, Ryan, Smolyanskiy, Nikolai, Birchfield, Stan, Popov, Alexander, Wehr, David, Eden, Ibrahim, Pehserl, Joachim

论文摘要

自主驾驶需要推断可操作的信息，例如检测和分类对象，并确定可驱动的空间。为此，我们使用单个LiDar Point Cloud的多个视图，介绍了多级对象检测和可驱动空间分割的两阶段深神经网络的多视图LIDARNET（MVLIDARNET）。第一个阶段处理点云投射到透视图上，以便将场景分割。然后，第二阶段处理点云（以及第一阶段的语义标签）投射到鸟类的视图上，以检测和分类对象。这两个阶段都使用编码器架构。我们表明，我们的多视图，多阶段的多级方法能够检测和对对象进行分类，同时使用单个LIDAR扫描作为输入来确定可驱动的空间，在具有一百多个汽车和徒步旅行者的挑战性场景中。该系统在嵌入式GPU上有效地在150 fps上运行，专为自动驾驶汽车而设计，包括随着时间的推移维护身份的后处理步骤。我们在Kitti和更大的内部数据集上显示了结果，从而证明了该方法按数量级扩展的能力。

Autonomous driving requires the inference of actionable information such as detecting and classifying objects, and determining the drivable space. To this end, we present Multi-View LidarNet (MVLidarNet), a two-stage deep neural network for multi-class object detection and drivable space segmentation using multiple views of a single LiDAR point cloud. The first stage processes the point cloud projected onto a perspective view in order to semantically segment the scene. The second stage then processes the point cloud (along with semantic labels from the first stage) projected onto a bird's eye view, to detect and classify objects. Both stages use an encoder-decoder architecture. We show that our multi-view, multi-stage, multi-class approach is able to detect and classify objects while simultaneously determining the drivable space using a single LiDAR scan as input, in challenging scenes with more than one hundred vehicles and pedestrians at a time. The system operates efficiently at 150 fps on an embedded GPU designed for a self-driving car, including a postprocessing step to maintain identities over time. We show results on both KITTI and a much larger internal dataset, thus demonstrating the method's ability to scale by an order of magnitude.

下载PDF全文

下载文献需遵守相关版权规定

论文标题