论文标题

俄罗斯医学文本中的非典型词汇缩写识别

Atypical lexical abbreviations identification in Russian medical texts

论文作者

Berdichevskaia, Anna

论文摘要

缩写是一种单词形成的方法,旨在从初始短语的首字母中构建缩短项。隐式缩写通常会给没有准备的读者带来理解困难。在本文中,我们提出了一种有效的基于ML的算法,该算法允许识别俄罗斯文本中的缩写。该方法达到ROC AUC得分0.926和F1得分0.706,与基准相比,这被证实为竞争性。除管道外,我们还首先建立了与所需任务相关的俄罗斯数据集。

Abbreviation is a method of word formation that aims to construct the shortened term from the first letters of the initial phrase. Implicit abbreviations frequently cause the comprehension difficulties for unprepared readers. In this paper, we propose an efficient ML-based algorithm which allows to identify the abbreviations in Russian texts. The method achieves ROC AUC score 0.926 and F1 score 0.706 which are confirmed as competitive in comparison with the baselines. Along with the pipeline, we also establish first to our knowledge Russian dataset that is relevant for the desired task.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源