论文标题
Inspector小工具:基于数据编程的工业图像的标签系统
Inspector Gadget: A Data Programming-based Labeling System for Industrial Images
论文作者
论文摘要
随着图像的机器学习在软件2.0 ERA中变得民主化,其中一项严重的瓶颈正在确保足够的标记数据进行培训。在智能工厂通过分析工业图像依靠机器学习来控制产品质量控制的制造环境中,这个问题尤其重要。此类图像通常很大,只需要在只有一小部分有问题的情况下进行部分分析(例如,识别表面上的缺陷)。由于手动标记这些图像很昂贵,因此弱监督是一种有吸引力的替代方法,其想法是生成不完美但可以大规模生产的弱标签。数据编程是该类别中最近的一个范式,它以标记功能的形式使用人类知识,并将其结合到生成模型中。数据编程在基于文本或结构化数据的应用程序中已成功,并且通常可以将其应用于图像,如果可以找到将它们转换为结构化数据的方法。在这项工作中,我们通过将其直接应用于没有这种转换的图像来扩展数据编程的视野,这是工业应用程序的常见情况。我们提出了Inspector Gadget,这是一个将众包,数据增强和数据编程结合在一起的图像标签系统,以在大小上生产弱标签以进行图像分类。我们对实际工业图像数据集进行了实验,并表明Inspector小工具比其他弱标记技术更好的性能:使用卷积神经网络(CNN)而无需预训练,使用卷积神经网络(CNN)。
As machine learning for images becomes democratized in the Software 2.0 era, one of the serious bottlenecks is securing enough labeled data for training. This problem is especially critical in a manufacturing setting where smart factories rely on machine learning for product quality control by analyzing industrial images. Such images are typically large and may only need to be partially analyzed where only a small portion is problematic (e.g., identifying defects on a surface). Since manual labeling these images is expensive, weak supervision is an attractive alternative where the idea is to generate weak labels that are not perfect, but can be produced at scale. Data programming is a recent paradigm in this category where it uses human knowledge in the form of labeling functions and combines them into a generative model. Data programming has been successful in applications based on text or structured data and can also be applied to images usually if one can find a way to convert them into structured data. In this work, we expand the horizon of data programming by directly applying it to images without this conversion, which is a common scenario for industrial applications. We propose Inspector Gadget, an image labeling system that combines crowdsourcing, data augmentation, and data programming to produce weak labels at scale for image classification. We perform experiments on real industrial image datasets and show that Inspector Gadget obtains better performance than other weak-labeling techniques: Snuba, GOGGLES, and self-learning baselines using convolutional neural networks (CNNs) without pre-training.