论文标题
话语和说话者特征对唇裂和口感的儿童分类的影响
Influence of Utterance and Speaker Characteristics on the Classification of Children with Cleft Lip and Palate
论文作者
论文摘要
最近的发现表明,预先训练的WAV2VEC 2.0模型是可靠的特征提取器,用于各种说话者特征分类任务。我们表明,在预训练的WAV2VEC 2.0系统的不同层上提取的潜在表示可以用作二进制分类的特征,以区分唇lip裂的儿童(CLP)和健康的对照组。结果表明,CLP和健康声音之间的区别,尤其是从中层和中间编码器层的潜在表示,精度为100%。我们测试分类器,以发现具有不同特征的健康和病理性语料库的室外外域的影响因素:年龄,口语内容和声学条件。交叉病理学和跨健康测试表明,如果训练和室外测试数据之间存在不匹配,例如年龄,口语内容或声学条件,则训练有素的分类器是不可靠的。
Recent findings show that pre-trained wav2vec 2.0 models are reliable feature extractors for various speaker characteristics classification tasks. We show that latent representations extracted at different layers of a pre-trained wav2vec 2.0 system can be used as features for binary classification to distinguish between children with Cleft Lip and Palate (CLP) and a healthy control group. The results indicate that the distinction between CLP and healthy voices, especially with latent representations from the lower and middle encoder layers, reaches an accuracy of 100%. We test the classifier to find influencing factors for classification using unseen out-of-domain healthy and pathologic corpora with varying characteristics: age, spoken content, and acoustic conditions. Cross-pathology and cross-healthy tests reveal that the trained classifiers are unreliable if there is a mismatch between training and out-of-domain test data in, e.g., age, spoken content, or acoustic conditions.