关于使用X-Vector的文本依赖扬声器验证的瓶颈功能

论文标题

关于使用X-Vector的文本依赖扬声器验证的瓶颈功能

On Bottleneck Features for Text-Dependent Speaker Verification Using X-vectors

论文作者

Sarkar, Achintya Kumar, Tan, Zheng-Hua

论文摘要

将X量向量应用于说话者验证最近引起了极大的兴趣，重点是与文本无关的说话者验证。在本文中，我们研究X-向量的文本依赖性说话者验证（TD-SV），但仍未探索。我们进一步研究了不同的瓶颈（BN）特征对X-矢量性能的影响，包括最近引入的时间对抗性学习（TCL）BN功能和电话歧视BN功能。 TCL是一种弱监督的学习方法，它通过将每种话语均匀地分配到预定义的段中，从而构建培训数据，然后根据其在话语中的位置分配每个段标签。我们还比较了不同建模技术的TD-SV性能，包括高斯混合物模型 - 通用背景模型（GMM-UBM），I-vector和X-vector。实验是在Reddots 2016挑战数据库上进行的。据发现，功能类型对具有最低误差率的TCL BN功能的X-VECTOR的性能具有边缘影响，而特征的影响对I-Vector和GMM-UBM的影响很大。 X-Vector和I-Vector Systems的融合可带来很大的性能增长。 GMM-UBM技术使用简短的话语显示了TD-SV的优势。

Applying x-vectors for speaker verification has recently attracted great interest, with the focus being on text-independent speaker verification. In this paper, we study x-vectors for text-dependent speaker verification (TD-SV), which remains unexplored. We further investigate the impact of the different bottleneck (BN) features on the performance of x-vectors, including the recently-introduced time-contrastive-learning (TCL) BN features and phone-discriminant BN features. TCL is a weakly supervised learning approach that constructs training data by uniformly partitioning each utterance into a predefined number of segments and then assigning each segment a class label depending on their position in the utterance. We also compare TD-SV performance for different modeling techniques, including the Gaussian mixture models-universal background model (GMM-UBM), i-vector, and x-vector. Experiments are conducted on the RedDots 2016 challenge database. It is found that the type of features has a marginal impact on the performance of x-vectors with the TCL BN feature achieving the lowest equal error rate, while the impact of features is significant for i-vector and GMM-UBM. The fusion of x-vector and i-vector systems gives a large gain in performance. The GMM-UBM technique shows its advantage for TD-SV using short utterances.

下载PDF全文

下载文献需遵守相关版权规定

论文标题