通过健忘的因果语言模型进行更好的射击和填充表现

论文标题

通过健忘的因果语言模型进行更好的射击和填充表现

Towards Better Few-Shot and Finetuning Performance with Forgetful Causal Language Models

论文作者

Liu, Hao, Geng, Xinyang, Lee, Lisa, Mordatch, Igor, Levine, Sergey, Narang, Sharan, Abbeel, Pieter

论文摘要

近年来，使用下一步预测目标（例如GPT3和Palm）培训的大型语言模型（LLM）通过在各种任务中显示出令人印象深刻的零击和很少的射击功能，从而彻底改变了自然语言处理。在这项工作中，我们提出了一种简单的技术，可以显着提高LLM的性能而不增加计算成本。我们的主要观察结果是，通过执行以前选择的过去令牌掩盖的下一个令牌预测任务，我们可以提高学习表示的质量，以了解下游语言理解任务。我们假设随机掩盖过去的代币会阻止过度到最近的令牌，并鼓励人们注意遥远的过去的令牌。我们发现我们的方法，健忘的因果掩盖（FCM）可显着改善棕榈的少量和捕获性能。我们进一步考虑了一个简单的扩展，T-FCM，该扩展将双向上下文引入因果语言模型而不改变序列顺序，并进一步改善了鉴定性能。

Large language models (LLM) trained using the next-token-prediction objective, such as GPT3 and PaLM, have revolutionized natural language processing in recent years by showing impressive zero-shot and few-shot capabilities across a wide range of tasks. In this work, we propose a simple technique that significantly boosts the performance of LLMs without adding computational cost. Our key observation is that, by performing the next token prediction task with randomly selected past tokens masked out, we can improve the quality of the learned representations for downstream language understanding tasks. We hypothesize that randomly masking past tokens prevents over-attending to recent tokens and encourages attention to tokens in the distant past. We find that our method, Forgetful Causal Masking (FCM), significantly improves both few-shot and finetuning performance of PaLM. We further consider a simple extension, T-FCM, which introduces bidirectional context to causal language model without altering the sequence order, and further improves finetuning performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题