论文标题
产品10k:大规模的产品识别数据集
Products-10K: A Large-scale Product Recognition Dataset
论文作者
论文摘要
随着电子商务的快速发展,购物方式经历了革命性的进化。为了通过快速响应充分满足客户的巨大和多样化的在线购物需求,零售AI系统需要自动从股票保存单元(SKU)水平的图像和视频中自动识别产品。但是,产品识别仍然是一项具有挑战性的任务,因为许多SKU级产品在粗糙的瞥见上都是细粒度且视觉上相似的。尽管已经有一些产品基准可用,但是这些数据集太小(有限的产品)或嘈杂标记(缺乏人类标签)。在本文中,我们构建了一个名为“ Products-10k”的人体标记的产品图像数据集,其中包含10,000个精细粒度SKU级产品,由在线客户在JD.com中经常购买。根据我们的新数据库,我们还介绍了一些有用的技巧和技巧,以实现细粒度的产品识别。 products-10k数据集可通过https://products-10k.github.io/获得。
With the rapid development of electronic commerce, the way of shopping has experienced a revolutionary evolution. To fully meet customers' massive and diverse online shopping needs with quick response, the retailing AI system needs to automatically recognize products from images and videos at the stock-keeping unit (SKU) level with high accuracy. However, product recognition is still a challenging task, since many of SKU-level products are fine-grained and visually similar by a rough glimpse. Although there are already some products benchmarks available, these datasets are either too small (limited number of products) or noisy-labeled (lack of human labeling). In this paper, we construct a human-labeled product image dataset named "Products-10K", which contains 10,000 fine-grained SKU-level products frequently bought by online customers in JD.com. Based on our new database, we also introduced several useful tips and tricks for fine-grained product recognition. The products-10K dataset is available via https://products-10k.github.io/.