论文标题
端到端自动语音识别的增量学习
Incremental Learning for End-to-End Automatic Speech Recognition
论文作者
论文摘要
在本文中,我们为端到端自动语音识别(ASR)提出了一种增量学习方法,该方法使ASR系统能够在新任务上表现良好,同时保持其最初学习的绩效。为了减轻灾难性的遗忘在增量学习过程中,我们设计了一种基于可解释的ASR模型的新型知识蒸馏,该模型与基于响应的知识蒸馏结合使用,以维持原始模型的预测和预测的“原因”。我们的方法无需访问原始任务的培训数据,该方法解决了以前数据不再可用或联合培训成本高昂的情况。多级顺序训练任务的结果表明,我们的方法在减轻遗忘方面优于现有的方法。此外,在两种实际情况下,与目标参考联合训练方法相比,我们方法的性能下降为0.02%的字符错误率(CER),比基线方法的下降小97%。
In this paper, we propose an incremental learning method for end-to-end Automatic Speech Recognition (ASR) which enables an ASR system to perform well on new tasks while maintaining the performance on its originally learned ones. To mitigate catastrophic forgetting during incremental learning, we design a novel explainability-based knowledge distillation for ASR models, which is combined with a response-based knowledge distillation to maintain the original model's predictions and the "reason" for the predictions. Our method works without access to the training data of original tasks, which addresses the cases where the previous data is no longer available or joint training is costly. Results on a multi-stage sequential training task show that our method outperforms existing ones in mitigating forgetting. Furthermore, in two practical scenarios, compared to the target-reference joint training method, the performance drop of our method is 0.02% Character Error Rate (CER), which is 97% smaller than the drops of the baseline methods.