论文标题

Kformer:变压器进料层中的知识注入

Kformer: Knowledge Injection in Transformer Feed-Forward Layers

论文作者

Yao, Yunzhi, Huang, Shaohan, Dong, Li, Wei, Furu, Chen, Huajun, Zhang, Ningyu

论文摘要

最近几天见证了针对预训练的语言模型(PTM)的各种知识注入模型。但是,大多数以前的研究都忽略了PTM的能力,其能力数量存储在参数中。最近的一项研究观察到了饲料前向网络(FFN)中的知识神经元,该神经元负责表达事实知识。在这项工作中,我们提出了一个简单的模型,即Kformer,该模型利用PTMS中存储的知识和外部知识通过变压器FFN层中的知识注入。从经验上讲,两项知识密集的任务,常识性推理(即社会问题)和医学问题答案(即MEDQA-USMLE),表明Kformer可以比其他知识注入技术(如关注或基于注意的注入)产生更好的性能。我们认为,提出的简单模型和经验发现可能对社区开发更强大的知识注入方法可能有所帮助。代码在https://github.com/zjunlp/kformer中可用。

Recent days have witnessed a diverse set of knowledge injection models for pre-trained language models (PTMs); however, most previous studies neglect the PTMs' own ability with quantities of implicit knowledge stored in parameters. A recent study has observed knowledge neurons in the Feed Forward Network (FFN), which are responsible for expressing factual knowledge. In this work, we propose a simple model, Kformer, which takes advantage of the knowledge stored in PTMs and external knowledge via knowledge injection in Transformer FFN layers. Empirically results on two knowledge-intensive tasks, commonsense reasoning (i.e., SocialIQA) and medical question answering (i.e., MedQA-USMLE), demonstrate that Kformer can yield better performance than other knowledge injection technologies such as concatenation or attention-based injection. We think the proposed simple model and empirical findings may be helpful for the community to develop more powerful knowledge injection methods. Code available in https://github.com/zjunlp/Kformer.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源