论文标题
通过潜在含义细胞扩展零拍的临床首字母缩写
Zero-Shot Clinical Acronym Expansion via Latent Meaning Cells
论文作者
论文摘要
我们介绍了潜在的含义细胞,这是一个深层可变模型,通过结合局部词汇上下文和元数据来学习单词的上下文化表示。元数据可以参考粒状上下文,例如部分类型或更全局的上下文,例如唯一的文档ID。对元数据的依赖在上下文化表示学习中是临床领域中文本是半结构的临床领域中的贴合性,并表达了主题的较高差异。我们评估了LMC模型在三个数据集中的零摄临床首字母缩写扩展的任务上。 LMC在培训前成本的一部分中显着优于多种基线,并学习临床上的相干表示。我们证明,不仅元数据本身对任务非常有帮助,而且LMC推理算法还提供了额外的巨大好处。
We introduce Latent Meaning Cells, a deep latent variable model which learns contextualized representations of words by combining local lexical context and metadata. Metadata can refer to granular context, such as section type, or to more global context, such as unique document ids. Reliance on metadata for contextualized representation learning is apropos in the clinical domain where text is semi-structured and expresses high variation in topics. We evaluate the LMC model on the task of zero-shot clinical acronym expansion across three datasets. The LMC significantly outperforms a diverse set of baselines at a fraction of the pre-training cost and learns clinically coherent representations. We demonstrate that not only is metadata itself very helpful for the task, but that the LMC inference algorithm provides an additional large benefit.