论文标题
暹罗神经网络具有修改的距离损失,用于在语音情绪识别中转移学习
A Siamese Neural Network with Modified Distance Loss For Transfer Learning in Speech Emotion Recognition
论文作者
论文摘要
自动情绪识别在人类计算机交互和物联网(IoT)技术的设计过程中起着重要作用。然而,情感识别系统中的一个常见问题在于可靠标签的稀缺性。通过对感兴趣的样本之间的成对差异进行建模,暹罗网络可以帮助缓解这一挑战,因为它比传统的深度学习方法所需的样本少。在本文中,我们提出了一个距离损失,可以通过基于相同和差异对之间的相关距离来优化模型来应用于暹罗网络微调。我们的系统使用来自源数据的样本来预先培训所提出的暹罗神经网络的权重,这些神经网络根据目标数据进行了微调。我们提出了使用语音的情感识别任务,因为它是最普遍且经常使用的生物行为信号之一。我们的目标数据来自RAVDESS数据集,而Crema-D和Enterface'05分别用作源数据。我们的结果表明,提出的距离损失能够极大地使暹罗网络的微调过程受益。同样,与冷冻层的数量相比,源数据的选择对暹罗网络性能具有更大的影响。这些表明,应用暹罗网络并在转移学习领域进行成对差异的巨大潜力以自动情绪识别。
Automatic emotion recognition plays a significant role in the process of human computer interaction and the design of Internet of Things (IOT) technologies. Yet, a common problem in emotion recognition systems lies in the scarcity of reliable labels. By modeling pairwise differences between samples of interest, a Siamese network can help to mitigate this challenge since it requires fewer samples than traditional deep learning methods. In this paper, we propose a distance loss, which can be applied on the Siamese network fine-tuning, by optimizing the model based on the relevant distance between same and difference class pairs. Our system use samples from the source data to pre-train the weights of proposed Siamese neural network, which are fine-tuned based on the target data. We present an emotion recognition task that uses speech, since it is one of the most ubiquitous and frequently used bio-behavioral signals. Our target data comes from the RAVDESS dataset, while the CREMA-D and eNTERFACE'05 are used as source data, respectively. Our results indicate that the proposed distance loss is able to greatly benefit the fine-tuning process of Siamese network. Also, the selection of source data has more effect on the Siamese network performance compared to the number of frozen layers. These suggest the great potential of applying the Siamese network and modelling pairwise differences in the field of transfer learning for automatic emotion recognition.