论文标题

基于自我监督语音表示的语音转换的比较研究

A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

论文作者

Huang, Wen-Chin, Yang, Shu-Wen, Hayashi, Tomoki, Toda, Tomoki

论文摘要

我们提出了一项对基于自我监督的语音表示(S3R)语音转换(VC)的大规模比较研究。在识别合成VC的背景下,S3RS具有替代昂贵的有监督表示的潜力,例如语音后验(PPG),这通常是由最先进的VC系统采用的。使用先前开发的开源VC软件S3PRL-VC,我们使用语音转换挑战2020(VCC2020)数据集提供了三种VC设置下的一系列深入目标和主观分析:内部/跨语言中的任何对一个(A2O)和任何对任何(A2A)VC。我们在各个方面研究了基于S3R的VC,包括模型类型,多语言和监督。我们还研究了通过K-均值聚类的饮后过程的效果,并展示了其在A2A设置中的改善。最后,与最先进的VC系统的比较证明了基于S3R的VC的竞争力,并阐明了可能的改进方向。

We present a large-scale comparative study of self-supervised speech representation (S3R)-based voice conversion (VC). In the context of recognition-synthesis VC, S3Rs are attractive owing to their potential to replace expensive supervised representations such as phonetic posteriorgrams (PPGs), which are commonly adopted by state-of-the-art VC systems. Using S3PRL-VC, an open-source VC software we previously developed, we provide a series of in-depth objective and subjective analyses under three VC settings: intra-/cross-lingual any-to-one (A2O) and any-to-any (A2A) VC, using the voice conversion challenge 2020 (VCC2020) dataset. We investigated S3R-based VC in various aspects, including model type, multilinguality, and supervision. We also studied the effect of a post-discretization process with k-means clustering and showed how it improves in the A2A setting. Finally, the comparison with state-of-the-art VC systems demonstrates the competitiveness of S3R-based VC and also sheds light on the possible improving directions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源