通过忘记概括 - 临床注释中症状事件提取的域概括

论文标题

通过忘记概括 - 临床注释中症状事件提取的域概括

Generalizing through Forgetting -- Domain Generalization for Symptom Event Extraction in Clinical Notes

论文作者

Zhou, Sitong, Lybarger, Kevin, Yetisgen, Meliha, Ostendorf, Mari

论文摘要

症状信息主要记录在自由文本临床笔记中，并且无法直接用于下游应用。为了应对这一挑战，需要采用可以处理不同机构和专业的临床语言变化的信息提取方法。在本文中，我们使用预处理和微调数据介绍了症状提取的领域概括，这些数据与机构和/或专业人群和患者人群不同。我们使用基于变压器的联合实体和关系提取方法提取症状事件。为了减少对域特异性特征的依赖，我们提出了一种域的概括方法，该方法可以动态掩盖源域中的频繁症状单词。此外，我们将变压器语言模型（LM）预先限定在与任务相关的无标记文本上，以更好地表示。我们的实验表明，当源域与目标结构域更遥远时，掩盖和自适应预处理方法可以显着提高性能。

Symptom information is primarily documented in free-text clinical notes and is not directly accessible for downstream applications. To address this challenge, information extraction approaches that can handle clinical language variation across different institutions and specialties are needed. In this paper, we present domain generalization for symptom extraction using pretraining and fine-tuning data that differs from the target domain in terms of institution and/or specialty and patient population. We extract symptom events using a transformer-based joint entity and relation extraction method. To reduce reliance on domain-specific features, we propose a domain generalization method that dynamically masks frequent symptoms words in the source domain. Additionally, we pretrain the transformer language model (LM) on task-related unlabeled texts for better representation. Our experiments indicate that masking and adaptive pretraining methods can significantly improve performance when the source domain is more distant from the target domain.

下载PDF全文

下载文献需遵守相关版权规定

论文标题