论文标题
但是在2020年Semeval-2020任务4:多语言常识
BUT-FIT at SemEval-2020 Task 4: Multilingual commonsense
论文作者
论文摘要
本文描述了But -Fit团队在Semeval 2020 Task 4的工作 - 常识验证和解释。我们参加了所有三个子任务。在子任务A和B中,我们的意见是基于验证的语言表示模型(即Albert)和数据增强。我们通过多语言模型和机器翻译的数据集或翻译模型输入来解决另一种语言捷克语的任务。我们表明,使用强大的机器翻译系统,我们的系统可以用另一种语言使用,精度损失较小。在子任务C中,我们的提交基于预审计的序列到序列模型(BART),在BLEU分数排名中排名第一,但是,我们表明,BLEU与人类评估之间的相关性,我们的提交最终第四次的相关性很低。我们分析了评估中使用的指标,并根据子任务B的模型提出了一个额外的分数,该分数与我们的手动排名以及基于相同原理的重新依赖方法很好地相关。我们对所有子任务进行了错误和数据集分析,并提出了发现。
This paper describes work of the BUT-FIT's team at SemEval 2020 Task 4 - Commonsense Validation and Explanation. We participated in all three subtasks. In subtasks A and B, our submissions are based on pretrained language representation models (namely ALBERT) and data augmentation. We experimented with solving the task for another language, Czech, by means of multilingual models and machine translated dataset, or translated model inputs. We show that with a strong machine translation system, our system can be used in another language with a small accuracy loss. In subtask C, our submission, which is based on pretrained sequence-to-sequence model (BART), ranked 1st in BLEU score ranking, however, we show that the correlation between BLEU and human evaluation, in which our submission ended up 4th, is low. We analyse the metrics used in the evaluation and we propose an additional score based on model from subtask B, which correlates well with our manual ranking, as well as reranking method based on the same principle. We performed an error and dataset analysis for all subtasks and we present our findings.