论文标题

马拉地语到英语神经机器翻译,近乎完美的语料库和变压器

Marathi To English Neural Machine Translation With Near Perfect Corpus And Transformers

论文作者

Jadhav, Swapnil Ashok

论文摘要

很少有尝试试图基于印度语言的神经机器翻译任务的最先进算法的表演。 Google,bing,Facebook和Yandex是为数不多的为少数印度语言构建翻译系统的公司中的一些。其中,根据一般检查,Google的翻译结果应该更好。 Bing-translator甚至不支持马拉地语,该语言在总和中的小学和次级演讲者方面,在世界上拥有约9500万人的演讲者,在世界上排名第15位。在这项练习中,我们使用Huggingface和通过Facebook的Fairseq平台进行了有限但几乎正确的平行语料库的Fairseq平台,培训和比较了通过Bert-Tokenizer和各种基于Transformer的架构进行培训的英语翻译人员,以比Tatoeba和Wikimedia Open DataSet上的Google获得更好的Bleu得分。

There have been very few attempts to benchmark performances of state-of-the-art algorithms for Neural Machine Translation task on Indian Languages. Google, Bing, Facebook and Yandex are some of the very few companies which have built translation systems for few of the Indian Languages. Among them, translation results from Google are supposed to be better, based on general inspection. Bing-Translator do not even support Marathi language which has around 95 million speakers and ranks 15th in the world in terms of combined primary and secondary speakers. In this exercise, we trained and compared variety of Neural Machine Marathi to English Translators trained with BERT-tokenizer by huggingface and various Transformer based architectures using Facebook's Fairseq platform with limited but almost correct parallel corpus to achieve better BLEU scores than Google on Tatoeba and Wikimedia open datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源