论文标题
双耳声应用的个性化头部相关传递功能
Individualizing Head-Related Transfer Functions for Binaural Acoustic Applications
论文作者
论文摘要
与头部相关的传递函数(HRTF)表征了人耳的声音从空间的某个点接收声音,并取决于头,pinna和躯干的形状。对人类受试者的HRTF的准确估计对于实现双耳声学应用,例如声音定位和3D声音空间化至关重要。不幸的是,用于HRTF估计的常规方法依赖于专业设备或冗长的测量过程。这项工作提出了一种用于HRTF个性化的新型轻型方法,可以使用商业货架组件来实施,并由普通用户在家庭设置中执行。所提出的方法具有两个关键组成部分:一个可以个性化的生成神经网络模型,可以从稀疏测量中预测新受试者的HRTF,以及从空间位置收集HRTF数据的轻量级测量程序。使用公共数据集和来自不同年龄和性别的受试者的房屋测量数据进行的广泛实验表明,个性化模型在预测的HRTF的准确性方面大大优于基线模型。为了进一步证明个性化HRTF的优势,我们实施了两个原型应用,以进行双耳定位和声学空间化。我们发现,经过个性化HRTF培训后,本地化模型的性能提高了15度。此外,在听力测试中,正确识别传入声音方向方向的成功率在个性化后增加了183%。
A Head Related Transfer Function (HRTF) characterizes how a human ear receives sounds from a point in space, and depends on the shapes of one's head, pinna, and torso. Accurate estimations of HRTFs for human subjects are crucial in enabling binaural acoustic applications such as sound localization and 3D sound spatialization. Unfortunately, conventional approaches for HRTF estimation rely on specialized devices or lengthy measurement processes. This work proposes a novel lightweight method for HRTF individualization that can be implemented using commercial-off-the-shelf components and performed by average users in home settings. The proposed method has two key components: a generative neural network model that can be individualized to predict HRTFs of new subjects from sparse measurements, and a lightweight measurement procedure that collects HRTF data from spatial locations. Extensive experiments using a public dataset and in house measurement data from 10 subjects of different ages and genders, show that the individualized models significantly outperform a baseline model in the accuracy of predicted HRTFs. To further demonstrate the advantages of individualized HRTFs, we implement two prototype applications for binaural localization and acoustic spatialization. We find that the performance of a localization model is improved by 15 degree after trained with individualized HRTFs. Furthermore, in hearing tests, the success rate of correctly identifying the azimuth direction of incoming sounds increases by 183% after individualization.