论文标题
生物医学数据的多语言神经机器翻译模型
A Multilingual Neural Machine Translation Model for Biomedical Data
论文作者
论文摘要
我们发布了多语言神经机器翻译模型,该模型可用于在生物医学领域中翻译文本。该模型可以从5种语言(法语,德语,意大利语,韩语和西班牙语)转化为英语。它使用域标签对其进行了大量通用和生物医学数据的训练。我们的基准表明,它在新闻(通用域)和生物医学测试集的最先进的方面表现出色,并且表现优于现有公开发布的模型。我们认为,此版本将有助于对Covid-19危机的数字内容及其对社会,经济和医疗保健政策的影响进行大规模的多语言分析。 我们还为韩国英语发布了一组生物医学文本。它包括官方准则和最近的论文的758句,所有句子都与19岁有关。
We release a multilingual neural machine translation model, which can be used to translate text in the biomedical domain. The model can translate from 5 languages (French, German, Italian, Korean and Spanish) into English. It is trained with large amounts of generic and biomedical data, using domain tags. Our benchmarks show that it performs near state-of-the-art both on news (generic domain) and biomedical test sets, and that it outperforms the existing publicly released models. We believe that this release will help the large-scale multilingual analysis of the digital content of the COVID-19 crisis and of its effects on society, economy, and healthcare policies. We also release a test set of biomedical text for Korean-English. It consists of 758 sentences from official guidelines and recent papers, all about COVID-19.