论文标题
ADFF:基于注意力的深层功能融合方法,用于音乐情感识别
ADFF: Attention Based Deep Feature Fusion Approach for Music Emotion Recognition
论文作者
论文摘要
音乐情感识别(MER)是音乐信息检索的子任务(MIR),近年来已经迅速发展。但是,学习情感降低功能仍然是一个挑战。在本文中,我们为MER提出了一种基于端到端的基于注意力的深度融合(ADFF)方法。该方法仅将log-mel-spectrogram作为输入作为输入,才将vggnet用作空间特征学习模块(SFLM),以在不同级别上获得空间特征。然后,这些特征被馈入挤压和兴奋(SE)基于注意力的时间特征学习模块(TFLM),以获得多层次与情感相关的时空特征(ESTFS),这可以很好地区分最终情感空间中的情感。此外,设计了一种新型的数据处理来将单通道输入切入多渠道,以提高计算效率,同时确保MER的质量。实验表明,与最先进的模型相比,我们提出的方法在R2评分上分别在R2评分上分别实现了10.43%和4.82%的相对改善,同时在具有不同尺度和多任务学习的数据集上表现更好。
Music emotion recognition (MER), a sub-task of music information retrieval (MIR), has developed rapidly in recent years. However, the learning of affect-salient features remains a challenge. In this paper, we propose an end-to-end attention-based deep feature fusion (ADFF) approach for MER. Only taking log Mel-spectrogram as input, this method uses adapted VGGNet as spatial feature learning module (SFLM) to obtain spatial features across different levels. Then, these features are fed into squeeze-and-excitation (SE) attention-based temporal feature learning module (TFLM) to get multi-level emotion-related spatial-temporal features (ESTFs), which can discriminate emotions well in the final emotion space. In addition, a novel data processing is devised to cut the single-channel input into multi-channel to improve calculative efficiency while ensuring the quality of MER. Experiments show that our proposed method achieves 10.43% and 4.82% relative improvement of valence and arousal respectively on the R2 score compared to the state-of-the-art model, meanwhile, performs better on datasets with distinct scales and in multi-task learning.