论文标题
端到端听觉对象通过插入核识别
End-to-End Auditory Object Recognition via Inception Nucleus
论文作者
论文摘要
传统上,机器学习方法识别的方法是基于工程的功能,例如源自频谱或cepstrum的功能。最近,已经开发了图像和听觉识别系统中的端到端分类系统,以共同学习与分类的共同学习功能,并提高了分类精度。在本文中,我们提出了一个新颖的端到端深神经网络,将原始波形输入映射到声音类标签。我们的网络包括一个“启动核”,该核心可以即时优化卷积过滤器的大小,从而大大减少工程工作。分类结果与当前的最新方法进行了比较,在urbansound8k数据集中击败了10.4个百分点。对学习表示的分析表明,在较早的隐藏层中的过滤器学到了类似小波的变换,以提取分类信息的特征。
Machine learning approaches to auditory object recognition are traditionally based on engineered features such as those derived from the spectrum or cepstrum. More recently, end-to-end classification systems in image and auditory recognition systems have been developed to learn features jointly with classification and result in improved classification accuracy. In this paper, we propose a novel end-to-end deep neural network to map the raw waveform inputs to sound class labels. Our network includes an "inception nucleus" that optimizes the size of convolutional filters on the fly that results in reducing engineering efforts dramatically. Classification results compared favorably against current state-of-the-art approaches, besting them by 10.4 percentage points on the Urbansound8k dataset. Analyses of learned representations revealed that filters in the earlier hidden layers learned wavelet-like transforms to extract features that were informative for classification.