论文标题
多音音乐中的几声鼓转录
Few-Shot Drum Transcription in Polyphonic Music
论文作者
论文摘要
自动鼓转录(ADT)的数据驱动方法通常仅限于预定义的打击乐器类别的小词汇。这样的模型无法识别出频繁的类别,也无法适应更细粒度的词汇。在这项工作中,我们通过为任务引入很少的学习学习来解决开放词汇ADT。我们在合成数据集上训练原型网络,并在多个伴奏的多个现实世界中评估模型。我们表明,鉴于在推理时间仅少数选定的示例,我们可以匹配,在某些情况下,在固定的词汇环境下,我们的表现要优于最先进的监督ADT方法。同时,我们表明我们的模型可以成功地概括为在训练过程中看不见的细粒度或扩展词汇,这是一种根本无法运作的情况。我们对我们的实验结果进行了详细的分析,包括声音类别的性能分解和多拼合。
Data-driven approaches to automatic drum transcription (ADT) are often limited to a predefined, small vocabulary of percussion instrument classes. Such models cannot recognize out-of-vocabulary classes nor are they able to adapt to finer-grained vocabularies. In this work, we address open vocabulary ADT by introducing few-shot learning to the task. We train a Prototypical Network on a synthetic dataset and evaluate the model on multiple real-world ADT datasets with polyphonic accompaniment. We show that, given just a handful of selected examples at inference time, we can match and in some cases outperform a state-of-the-art supervised ADT approach under a fixed vocabulary setting. At the same time, we show that our model can successfully generalize to finer-grained or extended vocabularies unseen during training, a scenario where supervised approaches cannot operate at all. We provide a detailed analysis of our experimental results, including a breakdown of performance by sound class and by polyphony.