Vibus：数据有效的3D场景解析，视点瓶颈和不确定性 - 光谱建模

论文标题

Vibus：数据有效的3D场景解析，视点瓶颈和不确定性 - 光谱建模

VIBUS: Data-efficient 3D Scene Parsing with VIewpoint Bottleneck and Uncertainty-Spectrum Modeling

论文作者

Tian, Beiwen, Luo, Liyi, Zhao, Hao, Zhou, Guyue

论文摘要

最近，使用深度学习方法解析的3D场景一直是一个供暖主题。但是，具有全面监督模型的当前方法需要手动注释的点监督，这是非常不友好的，耗时的。因此，培训3D场景解析模型具有稀疏的监督是一个有趣的选择。我们将此任务称为数据有效的3D场景解析，并提出了一个名为Vibus的有效的两阶段框架来通过利用巨大的未标记点来解决它。在第一阶段，我们在未标记的点上进行了自我监督的表示学习，并具有拟议的观点瓶颈损失函数。损失函数源自在不同观点下在场景上施加的信息瓶颈目标，从而使表示过程中的学习过程没有退化和抽样。在第二阶段，根据不确定性 - 光谱建模从稀疏标签中收集伪标签。通过结合数据驱动的不确定性度量和3D网格频谱度量（源自正常方向和地球距离），可以获得强大的局部亲和力度量。有限的伽马/β混合模型用于分解这些度量的类别分布，从而自动选择阈值。我们在公共基准扫描仪上评估Vibus，并在验证集和在线测试服务器上实现最先进的结果。消融研究表明，观点瓶颈和不确定性 - 光谱建模都带来了重大改进。代码和模型可在https://github.com/air-discover/vibus上公开获取。

Recently, 3D scenes parsing with deep learning approaches has been a heating topic. However, current methods with fully-supervised models require manually annotated point-wise supervision which is extremely user-unfriendly and time-consuming to obtain. As such, training 3D scene parsing models with sparse supervision is an intriguing alternative. We term this task as data-efficient 3D scene parsing and propose an effective two-stage framework named VIBUS to resolve it by exploiting the enormous unlabeled points. In the first stage, we perform self-supervised representation learning on unlabeled points with the proposed Viewpoint Bottleneck loss function. The loss function is derived from an information bottleneck objective imposed on scenes under different viewpoints, making the process of representation learning free of degradation and sampling. In the second stage, pseudo labels are harvested from the sparse labels based on uncertainty-spectrum modeling. By combining data-driven uncertainty measures and 3D mesh spectrum measures (derived from normal directions and geodesic distances), a robust local affinity metric is obtained. Finite gamma/beta mixture models are used to decompose category-wise distributions of these measures, leading to automatic selection of thresholds. We evaluate VIBUS on the public benchmark ScanNet and achieve state-of-the-art results on both validation set and online test server. Ablation studies show that both Viewpoint Bottleneck and uncertainty-spectrum modeling bring significant improvements. Codes and models are publicly available at https://github.com/AIR-DISCOVER/VIBUS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题