论文标题
调查语言和方言鉴定楔形文字的方法
Investigating Machine Learning Methods for Language and Dialect Identification of Cuneiform Texts
论文作者
论文摘要
由于缺乏资源和令牌化问题,使用楔形文字符号编写的语言识别是一项艰巨的任务。 2019年Vardial中的楔形文字识别任务解决了识别七种语言和方言的问题;苏美尔语和六个阿卡迪语的方言:老巴比伦,巴比伦中间的外围,标准的巴比伦,新巴比龙,已故的巴比伦和新阿西里安人。本文介绍了Sharifcl团队在2019年Vardial 2019中采用的这个问题。最佳结果属于支持矢量机的合奏和幼稚的贝叶斯分类器,均涉及角色级特征,其宏观平均得分为72.10%。
Identification of the languages written using cuneiform symbols is a difficult task due to the lack of resources and the problem of tokenization. The Cuneiform Language Identification task in VarDial 2019 addresses the problem of identifying seven languages and dialects written in cuneiform; Sumerian and six dialects of Akkadian language: Old Babylonian, Middle Babylonian Peripheral, Standard Babylonian, Neo-Babylonian, Late Babylonian, and Neo-Assyrian. This paper describes the approaches taken by SharifCL team to this problem in VarDial 2019. The best result belongs to an ensemble of Support Vector Machines and a naive Bayes classifier, both working on character-level features, with macro-averaged F1-score of 72.10%.