ADFF：基于注意力的深层功能融合方法，用于音乐情感识别

论文标题

ADFF：基于注意力的深层功能融合方法，用于音乐情感识别

ADFF: Attention Based Deep Feature Fusion Approach for Music Emotion Recognition

论文作者

Huang, Zi, Ji, Shulei, Hu, Zhilan, Cai, Chuangjian, Luo, Jing, Yang, Xinyu

论文摘要

音乐情感识别（MER）是音乐信息检索的子任务（MIR），近年来已经迅速发展。但是，学习情感降低功能仍然是一个挑战。在本文中，我们为MER提出了一种基于端到端的基于注意力的深度融合（ADFF）方法。该方法仅将log-mel-spectrogram作为输入作为输入，才将vggnet用作空间特征学习模块（SFLM），以在不同级别上获得空间特征。然后，这些特征被馈入挤压和兴奋（SE）基于注意力的时间特征学习模块（TFLM），以获得多层次与情感相关的时空特征（ESTFS），这可以很好地区分最终情感空间中的情感。此外，设计了一种新型的数据处理来将单通道输入切入多渠道，以提高计算效率，同时确保MER的质量。实验表明，与最先进的模型相比，我们提出的方法在R2评分上分别在R2评分上分别实现了10.43％和4.82％的相对改善，同时在具有不同尺度和多任务学习的数据集上表现更好。

Music emotion recognition (MER), a sub-task of music information retrieval (MIR), has developed rapidly in recent years. However, the learning of affect-salient features remains a challenge. In this paper, we propose an end-to-end attention-based deep feature fusion (ADFF) approach for MER. Only taking log Mel-spectrogram as input, this method uses adapted VGGNet as spatial feature learning module (SFLM) to obtain spatial features across different levels. Then, these features are fed into squeeze-and-excitation (SE) attention-based temporal feature learning module (TFLM) to get multi-level emotion-related spatial-temporal features (ESTFs), which can discriminate emotions well in the final emotion space. In addition, a novel data processing is devised to cut the single-channel input into multi-channel to improve calculative efficiency while ensuring the quality of MER. Experiments show that our proposed method achieves 10.43% and 4.82% relative improvement of valence and arousal respectively on the R2 score compared to the state-of-the-art model, meanwhile, performs better on datasets with distinct scales and in multi-task learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题