论文标题
基于变压器的编码器架构用于口语术语检测
Transformer-based encoder-encoder architecture for Spoken Term Detection
论文作者
论文摘要
本文介绍了一种基于变压器体系结构的口语术语检测方法。我们建议使用两个类似Bert的编码器进行编码编码器架构,并具有其他修改,包括卷积和UPSMPLING层,注意力掩盖和共享参数。编码器项目一个公认的假设和一个搜索术语到共享嵌入空间中,其中使用校准的点产品计算了推定的命中分数。在实验中,我们使用了WAV2VEC 2.0语音识别器,并且所提出的系统优于基于英语和捷克性病数据集的基线方法,该方法基于USC Shoah基础依据的Visual History Archive(Malach)。
The paper presents a method for spoken term detection based on the Transformer architecture. We propose the encoder-encoder architecture employing two BERT-like encoders with additional modifications, including convolutional and upsampling layers, attention masking, and shared parameters. The encoders project a recognized hypothesis and a searched term into a shared embedding space, where the score of the putative hit is computed using the calibrated dot product. In the experiments, we used the Wav2Vec 2.0 speech recognizer, and the proposed system outperformed a baseline method based on deep LSTMs on the English and Czech STD datasets based on USC Shoah Foundation Visual History Archive (MALACH).