论文标题
快速频率歧视和音素识别,使用与神经网络耦合的仿生膜
Fast frequency discrimination and phoneme recognition using a biomimetic membrane coupled to a neural network
论文作者
论文摘要
在人耳中,基底膜在声音识别中起着核心作用。当声音激发时,该膜以频率依赖的位移模式做出反应,该模式被听觉毛细胞与人类神经系统结合使用。受此结构的启发,我们设计和制造了一种人造膜,该膜响应于听觉信号而产生空间位移模式,我们用来训练卷积神经网络(CNN)。当接受单频色调训练时,该系统可以明确区分频率紧密的音调。当经过训练以识别口语元音时,该系统的表现优于现有的音素识别方法,包括离散的傅立叶变换(DFT),Zoom FFT和Chirp Z-Transform,尤其是在短时间窗口中进行测试时。因此,与现有方法相比,该声音识别方案有望在快速,准确的声音识别中有显着的好处。
In the human ear, the basilar membrane plays a central role in sound recognition. When excited by sound, this membrane responds with a frequency-dependent displacement pattern that is detected and identified by the auditory hair cells combined with the human neural system. Inspired by this structure, we designed and fabricated an artificial membrane that produces a spatial displacement pattern in response to an audible signal, which we used to train a convolutional neural network (CNN). When trained with single frequency tones, this system can unambiguously distinguish tones closely spaced in frequency. When instead trained to recognize spoken vowels, this system outperforms existing methods for phoneme recognition, including the discrete Fourier transform (DFT), zoom FFT and chirp z-transform, especially when tested in short time windows. This sound recognition scheme therefore promises significant benefits in fast and accurate sound identification compared to existing methods.