论文标题
改善口头尸检报告的死亡原因分类
Improving Cause-of-Death Classification from Verbal Autopsy Reports
论文作者
论文摘要
在包括南非在内的许多中低收入国家中,由于患者隐私和机密性政策,医疗机构的数据获取受到限制。此外,由于临床数据是各个机构和实验室所独有的,因此数据注释标准和约定不足。由于缺乏文本数据,自然语言处理(NLP)技术在卫生部门的表现较差。死亡原因(COD)通常是在没有可靠死亡注册系统的地方的口头尸检(VA)报告的。一名非临床主义现场工作者使用一组标准化问题来进行VA报告,以指导探索鳕鱼的症状。该分析的重点是VA报告的文本部分,作为解决卫生领域中NLP技术的挑战的案例研究。我们提出了一个系统,该系统依赖于单语学习和多源域的适应性的两个转移学习范式,以改善VA叙事的COD分类目标。我们使用来自变形金刚(BERT)的双向编码器表示,以及在一般英语和健康领域预先训练的语言模型(ELMO)模型的嵌入方式来从VA叙述中提取功能。我们的发现表明,此转移学习系统改进了COD分类任务,并且叙事文本包含弄清COD的有价值信息。我们的结果进一步表明,结合通过此框架学到的二进制VA特征和叙事文本功能可以增强COD的分类任务。
In many lower-and-middle income countries including South Africa, data access in health facilities is restricted due to patient privacy and confidentiality policies. Further, since clinical data is unique to individual institutions and laboratories, there are insufficient data annotation standards and conventions. As a result of the scarcity of textual data, natural language processing (NLP) techniques have fared poorly in the health sector. A cause of death (COD) is often determined by a verbal autopsy (VA) report in places without reliable death registration systems. A non-clinician field worker does a VA report using a set of standardized questions as a guide to uncover symptoms of a COD. This analysis focuses on the textual part of the VA report as a case study to address the challenge of adapting NLP techniques in the health domain. We present a system that relies on two transfer learning paradigms of monolingual learning and multi-source domain adaptation to improve VA narratives for the target task of the COD classification. We use the Bidirectional Encoder Representations from Transformers (BERT) and Embeddings from Language Models (ELMo) models pre-trained on the general English and health domains to extract features from the VA narratives. Our findings suggest that this transfer learning system improves the COD classification tasks and that the narrative text contains valuable information for figuring out a COD. Our results further show that combining binary VA features and narrative text features learned via this framework boosts the classification task of COD.