论文标题
端到端的歌词识别自我监督的学习
End-to-End Lyrics Recognition with Self-supervised Learning
论文作者
论文摘要
歌词识别是音乐处理中的重要任务。尽管传统的算法(例如混合HMM-TDNN模型都达到了良好的性能,但应用端到端模型和自我监督学习(SSL)的研究还是有限的。在本文中,我们首先建立了歌词识别的端到端基线,然后探索SSL模型在歌词识别任务中的性能。我们通过不同的训练方法(掩盖重建,掩盖预测,自回归重建和对比度学习)评估了各种上游SSL模型。我们在潮湿的音乐数据集上进行评估的端到端自我监督模型,即使没有大型语料库培训的语言模型,DEV集合的端到的模型也优于先前的最新系统(SOTA)系统,而测试集则胜过2.4%。此外,我们研究了背景音乐对自我监督学习模型的性能的影响,并得出结论,SSL模型无法在背景音乐的存在下有效提取功能。最后,考虑到这些模型未在音乐数据集中培训,我们研究了SSL功能的跨域泛化能力。
Lyrics recognition is an important task in music processing. Despite traditional algorithms such as the hybrid HMM- TDNN model achieving good performance, studies on applying end-to-end models and self-supervised learning (SSL) are limited. In this paper, we first establish an end-to-end baseline for lyrics recognition and then explore the performance of SSL models on lyrics recognition task. We evaluate a variety of upstream SSL models with different training methods (masked reconstruction, masked prediction, autoregressive reconstruction, and contrastive learning). Our end-to-end self-supervised models, evaluated on the DAMP music dataset, outperform the previous state-of-the-art (SOTA) system by 5.23% for the dev set and 2.4% for the test set even without a language model trained by a large corpus. Moreover, we investigate the effect of background music on the performance of self-supervised learning models and conclude that the SSL models cannot extract features efficiently in the presence of background music. Finally, we study the out-of-domain generalization ability of the SSL features considering that those models were not trained on music datasets.