论文标题
波斯的多模块G2P转换器专注于单词之间的关系
Multi-Module G2P Converter for Persian Focusing on Relations between Words
论文作者
论文摘要
在本文中,我们研究了波斯语的G2P转换的端到端和多模块框架的应用。结果表明,我们提出的多模型G2P系统在准确性和速度方面优于我们的端到端系统。该系统由发音词典作为我们的查找表组成,以及使用GRU和Transformer架构创建的波斯语中的同型,OOV和EZAFE的单独模型。该系统是序列级别而不是单词级别,它允许其有效地捕获单词(跨字信息)之间的不成文关系,而无需进行任何预处理。经过评估后,我们的系统达到了94.48%的单词级准确性,表现优于波斯语的先前G2P系统。
In this paper, we investigate the application of end-to-end and multi-module frameworks for G2P conversion for the Persian language. The results demonstrate that our proposed multi-module G2P system outperforms our end-to-end systems in terms of accuracy and speed. The system consists of a pronunciation dictionary as our look-up table, along with separate models to handle homographs, OOVs and ezafe in Persian created using GRU and Transformer architectures. The system is sequence-level rather than word-level, which allows it to effectively capture the unwritten relations between words (cross-word information) necessary for homograph disambiguation and ezafe recognition without the need for any pre-processing. After evaluation, our system achieved a 94.48% word-level accuracy, outperforming the previous G2P systems for Persian.