论文标题
在对象检测数据集中打击嘈杂的标签
Combating noisy labels in object detection datasets
论文作者
论文摘要
深度神经网络的训练数据集质量是导致模型准确性的关键因素。在诸如对象检测之类的困难任务中会放大此效果。处理数据集中的错误通常仅限于接受一些示例不正确的示例,估计其信心,并在培训期间分配适当的权重或忽略不确定的权重。在这项工作中,我们提出了另一种方法。我们介绍了对象检测的自信学习(CLOD)算法,以评估对象检测数据集中每个标签的质量,从而识别缺失,虚假,标签错误和错误的边界框和建议更正。通过专注于在培训数据集中查找错误的示例,我们可以以根源消除它们。可以审查可疑的边界框以提高数据集的质量,从而导致更好的模型,而不会进一步使其已经复杂的体系结构复杂化。该提出的方法能够指出近80%的人为受干扰的边界框,误光率低于0.1。通过将最自信的自动建议提高的地图分数提高16%至46%,根据数据集,清洁数据集,而没有对网络体系结构进行任何修改。这种方法显示了纠正最新对象检测数据集的有希望的潜力。
The quality of training datasets for deep neural networks is a key factor contributing to the accuracy of resulting models. This effect is amplified in difficult tasks such as object detection. Dealing with errors in datasets is often limited to accepting that some fraction of examples are incorrect, estimating their confidence, and either assigning appropriate weights or ignoring uncertain ones during training. In this work, we propose a different approach. We introduce the Confident Learning for Object Detection (CLOD) algorithm for assessing the quality of each label in object detection datasets, identifying missing, spurious, mislabeled, and mislocated bounding boxes and suggesting corrections. By focusing on finding incorrect examples in the training datasets, we can eliminate them at the root. Suspicious bounding boxes can be reviewed to improve the quality of the dataset, leading to better models without further complicating their already complex architectures. The proposed method is able to point out nearly 80% of artificially disturbed bounding boxes with a false positive rate below 0.1. Cleaning the datasets by applying the most confident automatic suggestions improved mAP scores by 16% to 46%, depending on the dataset, without any modifications to the network architectures. This approach shows promising potential in rectifying state-of-the-art object detection datasets.