自动提取有关形态协议的规则

论文标题

自动提取有关形态协议的规则

Automatic Extraction of Rules Governing Morphological Agreement

论文作者

Chaudhary, Aditi, Anastasopoulos, Antonios, Pratapa, Adithya, Mortensen, David R., Sheikh, Zaid, Tsvetkov, Yulia, Neubig, Graham

论文摘要

创建语言的描述性语法是语言文档和保存的必不可少的一步。但是，与此同时，这是一项乏味，耗时的任务。在本文中，我们通过设计一个自动化框架来以简洁，人类和机器可读的格式从原始文本中提取第一频道的语法规范来自动化此过程。我们专注于提取描述同意的规则，这是世界上许多语言语法核心的形态句法现象。我们将我们的框架应用于通用依赖项项目中包含的所有语言，并有令人鼓舞的结果。使用跨语性转移，即使没有感兴趣的语言的专家注释，我们的框架也提取了语法规范，该规范几乎等同于使用大量金标准注释数据创建的语法规范。我们通过对框架产生的规则的人类专家评估来确认这一发现，其平均准确性为78％。我们在https://neulab.github.io/lase/上发布一个界面，该界面演示了提取的规则。

Creating a descriptive grammar of a language is an indispensable step for language documentation and preservation. However, at the same time it is a tedious, time-consuming task. In this paper, we take steps towards automating this process by devising an automated framework for extracting a first-pass grammatical specification from raw text in a concise, human- and machine-readable format. We focus on extracting rules describing agreement, a morphosyntactic phenomenon at the core of the grammars of many of the world's languages. We apply our framework to all languages included in the Universal Dependencies project, with promising results. Using cross-lingual transfer, even with no expert annotations in the language of interest, our framework extracts a grammatical specification which is nearly equivalent to those created with large amounts of gold-standard annotated data. We confirm this finding with human expert evaluations of the rules that our framework produces, which have an average accuracy of 78%. We release an interface demonstrating the extracted rules at https://neulab.github.io/lase/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题