使用损失掩盖的教师学生框架来解决大规模声音事件识别中缺失的标签

论文标题

使用损失掩盖的教师学生框架来解决大规模声音事件识别中缺失的标签

Addressing Missing Labels in Large-Scale Sound Event Recognition Using a Teacher-Student Framework With Loss Masking

论文作者

Fonseca, Eduardo, Hershey, Shawn, Plakal, Manoj, Ellis, Daniel P. W., Jansen, Aren, Moore, R. Channing, Serra, Xavier

论文摘要

随着较大和嘈杂的数据集的出现，对声音事件识别中标签噪声的研究最近引起了人们的关注。这项工作解决了缺少标签的问题，大型音频数据集的大弱点之一以及音频集最明显的问题之一。我们提出了一种基于教师学生框架和损失掩盖的简单和模型的方法，以首先确定最关键的缺失标签候选者，然后在学习过程中忽略它们的贡献。我们发现，对训练标签集的简单优化可改善识别性能，而无需其他计算。我们发现，大多数改进来自忽略缺失标签的一小部分。我们还表明，由于训练组变小，缺少标签造成的损坏更大，但是即使在大量音频训练时仍可以观察到它。我们认为这些见解可以推广到其他大型数据集。

The study of label noise in sound event recognition has recently gained attention with the advent of larger and noisier datasets. This work addresses the problem of missing labels, one of the big weaknesses of large audio datasets, and one of the most conspicuous issues for AudioSet. We propose a simple and model-agnostic method based on a teacher-student framework with loss masking to first identify the most critical missing label candidates, and then ignore their contribution during the learning process. We find that a simple optimisation of the training label set improves recognition performance without additional computation. We discover that most of the improvement comes from ignoring a critical tiny portion of the missing labels. We also show that the damage done by missing labels is larger as the training set gets smaller, yet it can still be observed even when training with massive amounts of audio. We believe these insights can generalize to other large-scale datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题