中国语音疾病的深度学习自动分类

论文标题

中国语音疾病的深度学习自动分类

Deep Learning-based automated classification of Chinese Speech Sound Disorders

论文作者

Kuo, Yao-Ming, Ruan, Shanq-Jang, Chen, Yu-Chin, Tu, Ya-Wen

论文摘要

本文介绍了一个用于分析声学数据的系统，以帮助使用计算机诊断和分类儿童语音疾病（SSD）。该分析集中于识别和分类四种不同类型的中国SSD。该研究收集并生成了一个语音语料库，其中包含2540次停止，衬里，最终辅音缺失过程（FCDP），以及来自90名3--6岁儿童具有正常或病理表达特征的儿童的杂化样本。每个记录都伴随着两位语音语言病理学家（SLP）的详细诊断注释。语音样本的分类是使用三种良好的神经网络模型来完成图像分类的。使用三组从语音声音中提取并汇总为三维数据结构作为模型输入中提取的三组MEL频率Cepstral系数（MFCC）参数创建了特征图。我们采用了六种技术来增强数据，以增强可用数据集，同时避免过度拟合。实验检查了四种不同类别的中文短语和字符的可用性。具有不同数据子集的实验证明了系统准确检测分析的发音障碍的能力。使用单个中文短语的最佳多类分类的准确度为74.4％。

This article describes a system for analyzing acoustic data to assist in the diagnosis and classification of children's speech sound disorders (SSDs) using a computer. The analysis concentrated on identifying and categorizing four distinct types of Chinese SSDs. The study collected and generated a speech corpus containing 2540 stopping, backing, final consonant deletion process (FCDP), and affrication samples from 90 children aged 3--6 years with normal or pathological articulatory features. Each recording was accompanied by a detailed diagnostic annotation by two speech-language pathologists (SLPs). Classification of the speech samples was accomplished using three well-established neural network models for image classification. The feature maps were created using three sets of Mel-frequency cepstral coefficients (MFCC) parameters extracted from speech sounds and aggregated into a three-dimensional data structure as model input. We employed six techniques for data augmentation to augment the available dataset while avoiding overfitting. The experiments examine the usability of four different categories of Chinese phrases and characters. Experiments with different data subsets demonstrate the system's ability to accurately detect the analyzed pronunciation disorders. The best multi-class classification using a single Chinese phrase achieves an accuracy of 74.4~percent.

下载PDF全文

下载文献需遵守相关版权规定

论文标题