论文标题
使用多个ASR假设来提高I18N NLU性能
Using multiple ASR hypotheses to boost i18n NLU performance
论文作者
论文摘要
当前的语音助手通常使用其自动语音识别(ASR)模块所产生的最佳假设作为其自然语言理解(NLU)模块的输入,从而丢失了可能存储在较低级别的ASR假设中的有用信息。与两个语言数据集(德语和葡萄牙语)相比,我们在使用五个最佳ASR假设时,探讨了NLU相关任务的性能变化。为了从ASR五个最佳的ASR收集信息,我们利用用于域分类(DC)实验的提取性提取性摘要和联合提取性提取性汇总模型,同时使用指针生成器网络进行序列到序列模型,用于指针生成器网络(IC)和命名实体识别(NER)多任务实验。对于DC完整测试集,我们观察到分别为德国和葡萄牙语的微平均F1得分高达7.2%和15.5%。如果最好的ASR假设与转录的话语(不匹配的测试集)完全不匹配,则分别在德国和葡萄牙语中,我们看到分别提高了高达6.7%和8.8%的微平均F1分数。对于IC和NER多任务实验,在评估不匹配的测试集时,我们会看到德语中所有域的改进,在葡萄牙的19个域中有17个域中(基于SEMER分数的变化,改进)。我们的结果表明,使用多种ASR假设,而不是一种假设,可以导致这些非英语数据集的DC任务的显着提高。此外,在ASR模型犯错的情况下,IC和NER任务的性能可能会显着改善。
Current voice assistants typically use the best hypothesis yielded by their Automatic Speech Recognition (ASR) module as input to their Natural Language Understanding (NLU) module, thereby losing helpful information that might be stored in lower-ranked ASR hypotheses. We explore the change in performance of NLU associated tasks when utilizing five-best ASR hypotheses when compared to status quo for two language datasets, German and Portuguese. To harvest information from the ASR five-best, we leverage extractive summarization and joint extractive-abstractive summarization models for Domain Classification (DC) experiments while using a sequence-to-sequence model with a pointer generator network for Intent Classification (IC) and Named Entity Recognition (NER) multi-task experiments. For the DC full test set, we observe significant improvements of up to 7.2% and 15.5% in micro-averaged F1 scores, for German and Portuguese, respectively. In cases where the best ASR hypothesis was not an exact match to the transcribed utterance (mismatched test set), we see improvements of up to 6.7% and 8.8% micro-averaged F1 scores, for German and Portuguese, respectively. For IC and NER multi-task experiments, when evaluating on the mismatched test set, we see improvements across all domains in German and in 17 out of 19 domains in Portuguese (improvements based on change in SeMER scores). Our results suggest that the use of multiple ASR hypotheses, as opposed to one, can lead to significant performance improvements in the DC task for these non-English datasets. In addition, it could lead to significant improvement in the performance of IC and NER tasks in cases where the ASR model makes mistakes.