论文标题
通过扩展的凹槽MIDI数据集提高鼓转录的感知质量
Improving Perceptual Quality of Drum Transcription with the Expanded Groove MIDI Dataset
论文作者
论文摘要
我们介绍了扩展的Groove MIDI数据集(E-GMD),这是一种自动鼓转录(ADT)数据集,其中包含来自43个鼓工套件的444小时音频,使其比相似数据集大,并且首先具有人体表现的速度注释。我们使用E-GMD通过预测表达动态(速度)来优化用于下游生成的分类器,并通过听力测试显示它们具有提高感知质量的输出,尽管分类指标的结果相似。通过听力测试,我们认为标准分类器指标(例如准确性和F量评分)是下游任务中性能的足够代理,因为它们与生成的输出的感知质量不完全一致。
We introduce the Expanded Groove MIDI dataset (E-GMD), an automatic drum transcription (ADT) dataset that contains 444 hours of audio from 43 drum kits, making it an order of magnitude larger than similar datasets, and the first with human-performed velocity annotations. We use E-GMD to optimize classifiers for use in downstream generation by predicting expressive dynamics (velocity) and show with listening tests that they produce outputs with improved perceptual quality, despite similar results on classification metrics. Via the listening tests, we argue that standard classifier metrics, such as accuracy and F-measure score, are insufficient proxies of performance in downstream tasks because they do not fully align with the perceptual quality of generated outputs.