使用与卷积神经网络的多分辨率特征地图进行ASV中的抗旋转

论文标题

使用与卷积神经网络的多分辨率特征地图进行ASV中的抗旋转

Using Multi-Resolution Feature Maps with Convolutional Neural Networks for Anti-Spoofing in ASV

论文作者

Wang, Qiongqiong, Lee, Kong Aik, Koshinaka, Takafumi

论文摘要

本文提出了一种简单但有效的方法，该方法使用多分辨率的特征图与卷积神经网络（CNN）进行自动扬声器验证（ASV）的反动体组织。核心思想是减轻反欺骗网络中常用的特征地图不足以构建音频段的区分表示，因为它们通常是由单长的滑动窗口提取的。随时间和频率分辨率之间的权衡取舍限制了单光谱图中的信息。提出的方法通过堆叠使用不同窗口长度提取的多个频谱图来改善频率分辨率和时间分辨率。这些以多种渠道的形式被送入卷积神经网络，从而可以从输入信号中提取更多信息，而只会略微增加计算成本。该方法的效率已在ASVSPOOF 2019数据库中符合。我们表明，提出的多分辨率输入的使用始终优于不同CNN体系结构的得分融合。而且，计算成本仍然很小。

This paper presents a simple but effective method that uses multi-resolution feature maps with convolutional neural networks (CNNs) for anti-spoofing in automatic speaker verification (ASV). The central idea is to alleviate the problem that the feature maps commonly used in anti-spoofing networks are insufficient for building discriminative representations of audio segments, as they are often extracted by a single-length sliding window. Resulting trade-offs between time and frequency resolutions restrict the information in single spectrograms. The proposed method improves both frequency resolution and time resolution by stacking multiple spectrograms that are extracted using different window lengths. These are fed into a convolutional neural network in the form of multiple channels, making it possible to extract more information from input signals while only marginally increasing computational costs. The efficiency of the proposed method has been conformed on the ASVspoof 2019 database. We show that the use of the proposed multiresolution inputs consistently outperforms that of score fusion across different CNN architectures. Moreover, computational cost remains small.

下载PDF全文

下载文献需遵守相关版权规定

论文标题