论文标题
语义的语音:通过全神经界面共同改善ASR和NLU
Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces
论文作者
论文摘要
我们考虑了提取自然语言意图和相关插槽论点的口语理解(SLU)或主要针对语音助手的命名实体的问题。这样的系统既包含自动语音识别(ASR)以及自然语言理解(NLU)。可以为所需的规范构建端到端的关节SLU模型,以打开在硬件约束场景(例如设备)上部署的机会,使语音助手可以以隐私的方式离线工作,同时还降低了服务器成本。 我们首先提出模型,这些模型直接从语音中提取话,而无需中间文本输出。然后,我们提出了一个组成模型,该模型使用听力参加咒语ASR系统生成笔录,然后使用神经NLU模型提取解释。最后,我们将这些方法与经过联合训练的端到端关节SLU模型进行了对比,该模型由ASR和NLU子系统组成,这些系统由基于神经网络的界面而不是文本连接,而不是基于神经网络的界面,它会产生转录本以及NLU的解释。我们表明,训练有素的联合模型显示了ASR的改进,并通过将NLU暴露于隐藏层中编码的ASR混淆来改善NLU。
We consider the problem of spoken language understanding (SLU) of extracting natural language intents and associated slot arguments or named entities from speech that is primarily directed at voice assistants. Such a system subsumes both automatic speech recognition (ASR) as well as natural language understanding (NLU). An end-to-end joint SLU model can be built to a required specification opening up the opportunity to deploy on hardware constrained scenarios like devices enabling voice assistants to work offline, in a privacy preserving manner, whilst also reducing server costs. We first present models that extract utterance intent directly from speech without intermediate text output. We then present a compositional model, which generates the transcript using the Listen Attend Spell ASR system and then extracts interpretation using a neural NLU model. Finally, we contrast these methods to a jointly trained end-to-end joint SLU model, consisting of ASR and NLU subsystems which are connected by a neural network based interface instead of text, that produces transcripts as well as NLU interpretation. We show that the jointly trained model shows improvements to ASR incorporating semantic information from NLU and also improves NLU by exposing it to ASR confusion encoded in the hidden layer.