论文标题
音乐相似性的删除多维度量学习
Disentangled Multidimensional Metric Learning for Music Similarity
论文作者
论文摘要
音乐相似性搜索对于各种创意任务很有用,例如用类似的“感觉”替换另一个音乐录制,这是视频编辑中的常见任务。对于此任务,通常有必要定义一个相似性度量,以将一个录制与另一个录音进行比较。但是,音乐相似性很难定义,并且取决于多个相似性的同时概念(即类型,情绪,乐器,节奏)。在先前的工作忽略此问题的同时,我们接受了这个想法,并介绍了多维相似性的概念,并将全球和专业的相似性指标统一为一个单一的,语义上散布的多维相似性指标。为此,我们将称为条件相似性网络的深度度量学习变体适应音频域,并使用基于轨道的信息将其扩展,以控制模型的特殊性。我们评估我们的方法,并表明我们的单一多维模型优于专业相似性空间和替代基线。我们还经营一个用户研究,并表明我们的方法也受到人类注释者的青睐。
Music similarity search is useful for a variety of creative tasks such as replacing one music recording with another recording with a similar "feel", a common task in video editing. For this task, it is typically necessary to define a similarity metric to compare one recording to another. Music similarity, however, is hard to define and depends on multiple simultaneous notions of similarity (i.e. genre, mood, instrument, tempo). While prior work ignore this issue, we embrace this idea and introduce the concept of multidimensional similarity and unify both global and specialized similarity metrics into a single, semantically disentangled multidimensional similarity metric. To do so, we adapt a variant of deep metric learning called conditional similarity networks to the audio domain and extend it using track-based information to control the specificity of our model. We evaluate our method and show that our single, multidimensional model outperforms both specialized similarity spaces and alternative baselines. We also run a user-study and show that our approach is favored by human annotators as well.