BERT快速适应以在特定领域的商业文件上提取信息

论文标题

BERT快速适应以在特定领域的商业文件上提取信息

Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents

论文作者

Zhang, Ruixue, Yang, Wei, Lin, Luyun, Tu, Zhengkai, Xie, Yuqing, Fu, Zihang, Xie, Yuhao, Tan, Luchen, Xiong, Kun, Lin, Jimmy

论文摘要

从合同，陈述和文件中自动从业务文件中提取重要内容元素的技术有可能使业务运营效率更高。这个问题可以作为序列标签任务提出，我们演示了BERT对两种类型的业务文件的适应：监管申请和财产租赁协议。这个问题的某些方面使其比“标准”信息提取任务和其他方面更容易，这些方面使它变得更加困难，但是总的来说，我们发现适度的注释数据（少于100个文档）足以实现合理的精度。我们将模型集成到端到端云平台中，该平台提供易于使用的注释接口以及推理界面，该界面允许用户上传文档并检查模型输出。

Techniques for automatically extracting important content elements from business documents such as contracts, statements, and filings have the potential to make business operations more efficient. This problem can be formulated as a sequence labeling task, and we demonstrate the adaption of BERT to two types of business documents: regulatory filings and property lease agreements. There are aspects of this problem that make it easier than "standard" information extraction tasks and other aspects that make it more difficult, but on balance we find that modest amounts of annotated data (less than 100 documents) are sufficient to achieve reasonable accuracy. We integrate our models into an end-to-end cloud platform that provides both an easy-to-use annotation interface as well as an inference interface that allows users to upload documents and inspect model outputs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题