话语和说话者特征对唇裂和口感的儿童分类的影响

论文标题

话语和说话者特征对唇裂和口感的儿童分类的影响

Influence of Utterance and Speaker Characteristics on the Classification of Children with Cleft Lip and Palate

论文作者

Baumann, Ilja, Wagner, Dominik, Braun, Franziska, Bayerl, Sebastian P., Nöth, Elmar, Riedhammer, Korbinian, Bocklet, Tobias

论文摘要

最近的发现表明，预先训练的WAV2VEC 2.0模型是可靠的特征提取器，用于各种说话者特征分类任务。我们表明，在预训练的WAV2VEC 2.0系统的不同层上提取的潜在表示可以用作二进制分类的特征，以区分唇lip裂的儿童（CLP）和健康的对照组。结果表明，CLP和健康声音之间的区别，尤其是从中层和中间编码器层的潜在表示，精度为100％。我们测试分类器，以发现具有不同特征的健康和病理性语料库的室外外域的影响因素：年龄，口语内容和声学条件。交叉病理学和跨健康测试表明，如果训练和室外测试数据之间存在不匹配，例如年龄，口语内容或声学条件，则训练有素的分类器是不可靠的。

Recent findings show that pre-trained wav2vec 2.0 models are reliable feature extractors for various speaker characteristics classification tasks. We show that latent representations extracted at different layers of a pre-trained wav2vec 2.0 system can be used as features for binary classification to distinguish between children with Cleft Lip and Palate (CLP) and a healthy control group. The results indicate that the distinction between CLP and healthy voices, especially with latent representations from the lower and middle encoder layers, reaches an accuracy of 100%. We test the classifier to find influencing factors for classification using unseen out-of-domain healthy and pathologic corpora with varying characteristics: age, spoken content, and acoustic conditions. Cross-pathology and cross-healthy tests reveal that the trained classifiers are unreliable if there is a mismatch between training and out-of-domain test data in, e.g., age, spoken content, or acoustic conditions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题