论文标题
仅数据策展就可以稳定在文本学习
Data Curation Alone Can Stabilize In-context Learning
论文作者
论文摘要
在文化学习(ICL)中,大型语言模型(LLMS)可以通过提示他们使用一系列培训示例来执行新任务。但是,众所周知,ICL对训练示例的选择非常敏感:训练集的随机抽样示例会导致性能差异很大。在本文中,我们表明,仔细策划了一部分训练数据可以极大地稳定ICL性能,而无需对ICL算法进行任何其他更改(例如,提示检索或校准)。我们介绍了两种选择培训子集的方法 - 两者都单独评分训练示例,然后选择得分最高的训练示例。当结合随机培训示例时,Condacc通过其平均DEV-SET ICL精度得分,而DataModels学习了线性回归器,以估计每个训练的存在如何影响LLM输出。在五个任务和两个LLM中,由Condacc和Datamodels选择的稳定子集进行采样可提高从设置的整个训练中的平均准确性,分别提高了7.7%和6.3%。令人惊讶的是,与其他工作相比,稳定的子集示例在内容上并不是特别多样化,在提示LLMS时表明多样性和困惑很重要。
In-context learning (ICL) enables large language models (LLMs) to perform new tasks by prompting them with a sequence of training examples. However, it is known that ICL is very sensitive to the choice of training examples: randomly sampling examples from a training set leads to high variance in performance. In this paper, we show that carefully curating a subset of training data greatly stabilizes ICL performance without any other changes to the ICL algorithm (e.g., prompt retrieval or calibration). We introduce two methods to choose training subsets -- both score training examples individually, then select the highest-scoring ones. CondAcc scores a training example by its average dev-set ICL accuracy when combined with random training examples, while Datamodels learns linear regressors that estimate how the presence of each training example influences LLM outputs. Across five tasks and two LLMs, sampling from stable subsets selected by CondAcc and Datamodels improves average accuracy over sampling from the entire training set by 7.7% and 6.3%, respectively. Surprisingly, the stable subset examples are not especially diverse in content or low in perplexity, in contrast with other work suggesting that diversity and perplexity are important when prompting LLMs.