mvracklay：仓库和货架的单眼多视图布局估计

论文标题

mvracklay：仓库和货架的单眼多视图布局估计

MVRackLay: Monocular Multi-View Layout Estimation for Warehouse Racks and Shelves

论文作者

Pathre, Pranjali, Sahu, Anurag, Rao, Ashwin, Prabhu, Avinash, Nigam, Meher Shashwat, Karandikar, Tanvi, Pandya, Harit, Krishna, K. Madhava

论文摘要

在本文中，我们首次提出和展示仓库架和货架的单眼多视图布局估计。与典型的布局估计方法不同，mvracklay估计了多层布局，其中每层对应于机架中的架子的布局。鉴于仓库场景的一系列图像，双头卷积LSTM体系结构输出了分段的架子，在机架中每个架子的前视图布局和顶视图布局。通过最少的努力，这样的输出将转换为架子上所有机架，架子和物体的3D渲染，从架子，架子和每个架子上的对象数量方面，对整个仓库场景进行了准确的3D描述。 Mvracklay概括为各种仓库场景，每个架子上有不同数量的物体，架子数量以及背景中其他此类机架的存在。此外，Mvracklay显示出卓越的性能相对于单个视图对应物，Racklay，布局精度，根据平均值IOU和MAP指标进行量化。我们还展示了3D布局的多视图缝线，从而代表了仓库场景的代表，类似于全局参考框架，类似于Slam Pipeline的场景。据我们所知，这是第一个从单眼相机中描绘出其语义组件（架子，架子和物体）的仓库场景的3D渲染。

In this paper, we propose and showcase, for the first time, monocular multi-view layout estimation for warehouse racks and shelves. Unlike typical layout estimation methods, MVRackLay estimates multi-layered layouts, wherein each layer corresponds to the layout of a shelf within a rack. Given a sequence of images of a warehouse scene, a dual-headed Convolutional-LSTM architecture outputs segmented racks, the front and the top view layout of each shelf within a rack. With minimal effort, such an output is transformed into a 3D rendering of all racks, shelves and objects on the shelves, giving an accurate 3D depiction of the entire warehouse scene in terms of racks, shelves and the number of objects on each shelf. MVRackLay generalizes to a diverse set of warehouse scenes with varying number of objects on each shelf, number of shelves and in the presence of other such racks in the background. Further, MVRackLay shows superior performance vis-a-vis its single view counterpart, RackLay, in layout accuracy, quantized in terms of the mean IoU and mAP metrics. We also showcase a multi-view stitching of the 3D layouts resulting in a representation of the warehouse scene with respect to a global reference frame akin to a rendering of the scene from a SLAM pipeline. To the best of our knowledge, this is the first such work to portray a 3D rendering of a warehouse scene in terms of its semantic components - Racks, Shelves and Objects - all from a single monocular camera.

下载PDF全文

下载文献需遵守相关版权规定

论文标题