一个基于查询的统一范例，用于云云理解

论文标题

一个基于查询的统一范例，用于云云理解

A Unified Query-based Paradigm for Point Cloud Understanding

论文作者

Yang, Zetong, Jiang, Li, Sun, Yanan, Schiele, Bernt, Jia, Jiaya

论文摘要

3D点云理解是自主驾驶和机器人技术中的重要组成部分。在本文中，我们提出了一种新颖的嵌入式拼接范式（EQ-garadigm），用于3D理解任务，包括检测，分割和分类。 EQ-Paradigm是一个统一的范式，可以组合任何现有的3D骨干架构与不同的任务头。在EQ-Paradigm下，输入首先在嵌入阶段编码，具有任意特征提取体系结构，该体系与任务和头部无关。然后，查询阶段使编码功能适用于不同的任务头。这是通过在查询阶段引入中间表示（即Q代表）来实现的，以作为嵌入阶段和任务头之间的桥梁。我们将新颖的Q-Net设计为查询阶段网络。各种3D任务（包括对象检测，语义分割和形状分类）的广泛实验结果表明，与Q-NET相连的EQ-Paradigm是一条通用且有效的管道，它可以灵活地协作骨干和头部，并进一步促进了现行方法的性能。代码和模型可在https://github.com/dvlab-research/deepvision3d上找到。

3D point cloud understanding is an important component in autonomous driving and robotics. In this paper, we present a novel Embedding-Querying paradigm (EQ- Paradigm) for 3D understanding tasks including detection, segmentation, and classification. EQ-Paradigm is a unified paradigm that enables the combination of any existing 3D backbone architectures with different task heads. Under the EQ-Paradigm, the input is firstly encoded in the embedding stage with an arbitrary feature extraction architecture, which is independent of tasks and heads. Then, the querying stage enables the encoded features to be applicable for diverse task heads. This is achieved by introducing an intermediate representation, i.e., Q-representation, in the querying stage to serve as a bridge between the embedding stage and task heads. We design a novel Q- Net as the querying stage network. Extensive experimental results on various 3D tasks, including object detection, semantic segmentation and shape classification, show that EQ-Paradigm in tandem with Q-Net is a general and effective pipeline, which enables a flexible collaboration of backbones and heads, and further boosts the performance of the state-of-the-art methods. Codes and models are available at https://github.com/dvlab-research/DeepVision3D.

下载PDF全文

下载文献需遵守相关版权规定

论文标题