论文标题
超声舌成像中传感器错位的量化
Quantification of Transducer Misalignment in Ultrasound Tongue Imaging
论文作者
论文摘要
在语音生产研究中,已经采用了不同的成像方式来获得有关声带运动和塑造的准确信息。超声是一种负担得起且无创的成像方式,具有相对较高的时间和空间分辨率,可以研究语音生产过程中舌头的动态行为。但是,超声舌成像的长期问题是在较长的数据记录会话中的传感器未对准。在本文中,我们提出了一种简单但有效的未对准量化方法。该分析采用MSE距离和两个相似性测量指标来识别下巴和传感器之间的相对位移。我们将这些措施视为话语时间戳的函数。在匈牙利和苏格兰英国儿童数据集上进行了广泛的实验。结果表明,均方根误差(MSE)和结构相似性指数(SSIM)和复杂小波SSIM值的较大值表明数据记录期间的损坏或问题,这可能是由传感器的错位误差或缺乏凝胶引起的。
In speech production research, different imaging modalities have been employed to obtain accurate information about the movement and shaping of the vocal tract. Ultrasound is an affordable and non-invasive imaging modality with relatively high temporal and spatial resolution to study the dynamic behavior of tongue during speech production. However, a long-standing problem for ultrasound tongue imaging is the transducer misalignment during longer data recording sessions. In this paper, we propose a simple, yet effective, misalignment quantification approach. The analysis employs MSE distance and two similarity measurement metrics to identify the relative displacement between the chin and the transducer. We visualize these measures as a function of the timestamp of the utterances. Extensive experiments are conducted on a Hungarian and Scottish English child dataset. The results suggest that large values of Mean Square Error (MSE) and small values of Structural Similarity Index (SSIM) and Complex Wavelet SSIM indicate corruptions or issues during the data recordings, which can either be caused by transducer misalignment or lack of gel.