基于音频的音乐结构分析的Barwise压缩方案

论文标题

基于音频的音乐结构分析的Barwise压缩方案

Barwise Compression Schemes for Audio-Based Music Structure Analysis

论文作者

Marmoret, Axel, Cohen, Jérémy E., Bimbot, Frédéric

论文摘要

音乐结构分析（MSA）包括在几个不同的部分中细分音乐作品。我们在压缩框架内接近MSA，这是在通过简化的歌曲原始内容的简化表示可以更容易地揭示结构的假设。更具体地说，在MSA与条形尺度上发生的相似性相关的假设下，本文介绍了在Barwise Audio信号上使用线性和非线性压缩方案。压缩表示捕获了歌曲中不同条形的最显着组件，然后使用动态编程算法来推断歌曲结构。这项工作探索了低级别近似模型，例如主成分分析或非负矩阵分解和“特定”自动编码神经网络，目的是学习特定于给定歌曲的潜在表示。这种方法不依赖于监督或注释，而这些方法众所周知，在MSA描述中可能会收集和模棱两可。在我们的实验中，几种无监督的压缩方案达到了与RWC-POP数据集上最先进的监督方法（3s公差）相当的性能水平，从而展示了Barwise压缩处理对MSA的重要性。

Music Structure Analysis (MSA) consists in segmenting a music piece in several distinct sections. We approach MSA within a compression framework, under the hypothesis that the structure is more easily revealed by a simplified representation of the original content of the song. More specifically, under the hypothesis that MSA is correlated with similarities occurring at the bar scale, this article introduces the use of linear and non-linear compression schemes on barwise audio signals. Compressed representations capture the most salient components of the different bars in the song and are then used to infer the song structure using a dynamic programming algorithm. This work explores both low-rank approximation models such as Principal Component Analysis or Nonnegative Matrix Factorization and "piece-specific" Auto-Encoding Neural Networks, with the objective to learn latent representations specific to a given song. Such approaches do not rely on supervision nor annotations, which are well-known to be tedious to collect and possibly ambiguous in MSA description. In our experiments, several unsupervised compression schemes achieve a level of performance comparable to that of state-of-the-art supervised methods (for 3s tolerance) on the RWC-Pop dataset, showcasing the importance of the barwise compression processing for MSA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题