低资源口语理解的双向表示

论文标题

低资源口语理解的双向表示

Bidirectional Representations for Low Resource Spoken Language Understanding

论文作者

Meeus, Quentin, Moens, Marie-Francine, Van hamme, Hugo

论文摘要

大多数口语的语言理解系统都使用由自动语音识别接口和自然语言理解模块组成的管道方法。将连续输入转换为离散语言符号时，这种方法会迫使艰难的决策。取而代之的是，我们提出了一个表示模型，以在丰富的双向编码中编码语音，该编码可用于下游任务，例如意图预测。该方法使用蒙版的语言建模目标来学习表示形式，从而从左派和右下语中受益。我们表明，在微调之前进行的编码的性能要比多个数据集上的可比模型好，并且在流利的语音命令数据集中，对表示模型的顶层进行微调改善了当前的艺术状态，当时在低数据策略中，当使用有限的标记数据用于培训时。此外，我们将班级关注作为一种口语的语言理解模块，在速度和参数数量方面有效。可以使用班级注意来视觉解释我们的模型的预测，这在理解模型如何做出预测方面有很长的路要走。我们通过英语和荷兰进行实验。

Most spoken language understanding systems use a pipeline approach composed of an automatic speech recognition interface and a natural language understanding module. This approach forces hard decisions when converting continuous inputs into discrete language symbols. Instead, we propose a representation model to encode speech in rich bidirectional encodings that can be used for downstream tasks such as intent prediction. The approach uses a masked language modelling objective to learn the representations, and thus benefits from both the left and right contexts. We show that the performance of the resulting encodings before fine-tuning is better than comparable models on multiple datasets, and that fine-tuning the top layers of the representation model improves the current state of the art on the Fluent Speech Command dataset, also in a low-data regime, when a limited amount of labelled data is used for training. Furthermore, we propose class attention as a spoken language understanding module, efficient both in terms of speed and number of parameters. Class attention can be used to visually explain the predictions of our model, which goes a long way in understanding how the model makes predictions. We perform experiments in English and in Dutch.

下载PDF全文

下载文献需遵守相关版权规定

论文标题