评估基于CNN的自动音乐标记模型

论文标题

评估基于CNN的自动音乐标记模型

Evaluation of CNN-based Automatic Music Tagging Models

论文作者

Won, Minz, Ferraro, Andres, Bogdanov, Dmitry, Serra, Xavier

论文摘要

深度学习的最新进展加速了基于内容的自动音乐标记系统的开发。音乐信息检索（MIR）研究人员提出了各种建筑设计，主要是基于卷积神经网络（CNN），这些设计实现了最新的架构，从而导致了这项多标签二进制分类任务。但是，由于研究人员之后的实验设置有所不同，例如使用不同的数据集拆分和软件版本进行评估，因此很难将所提出的体系结构彼此直接比较。为了促进进一步的研究，在本文中，我们对三个数据集（Magnatagatune，Million Song DataSet和MTG-Jamendo）进行了一致评估，并使用常见评估指标（ROC-AUC和PR-AUC）提供参考结果。此外，所有模型均以扰动的输入进行评估，以研究有关时间延伸，音高移动，动态范围压缩和添加白噪声的概括能力。为了获得可重复性，我们为Pytorch实现提供了预训练的模型。

Recent advances in deep learning accelerated the development of content-based automatic music tagging systems. Music information retrieval (MIR) researchers proposed various architecture designs, mainly based on convolutional neural networks (CNNs), that achieve state-of-the-art results in this multi-label binary classification task. However, due to the differences in experimental setups followed by researchers, such as using different dataset splits and software versions for evaluation, it is difficult to compare the proposed architectures directly with each other. To facilitate further research, in this paper we conduct a consistent evaluation of different music tagging models on three datasets (MagnaTagATune, Million Song Dataset, and MTG-Jamendo) and provide reference results using common evaluation metrics (ROC-AUC and PR-AUC). Furthermore, all the models are evaluated with perturbed inputs to investigate the generalization capabilities concerning time stretch, pitch shift, dynamic range compression, and addition of white noise. For reproducibility, we provide the PyTorch implementations with the pre-trained models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题