论文标题
使用算法蒸馏的文本增强学习
In-context Reinforcement Learning with Algorithm Distillation
论文作者
论文摘要
我们提出了算法蒸馏(AD),这是一种通过因果序列模型对其训练历史进行建模,将加固学习(RL)算法蒸馏到神经网络中。算法蒸馏将学习作为跨序列的顺序预测问题进行学习。学习历史的数据集由源RL算法生成,然后通过自动加压预测的动作来训练因果变压器,鉴于其先前的学习历史作为上下文。与提炼后学习或专家序列的顺序策略预测不同,AD能够完全在不更新其网络参数的情况下将其策略完全改进。我们证明,在具有稀疏奖励,组合任务结构和基于像素的观测值的各种环境中,AD可以强化学习中文,并发现AD比生成源数据的AD学习更高的数据效率RL算法。
We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction problem. A dataset of learning histories is generated by a source RL algorithm, and then a causal transformer is trained by autoregressively predicting actions given their preceding learning histories as context. Unlike sequential policy prediction architectures that distill post-learning or expert sequences, AD is able to improve its policy entirely in-context without updating its network parameters. We demonstrate that AD can reinforcement learn in-context in a variety of environments with sparse rewards, combinatorial task structure, and pixel-based observations, and find that AD learns a more data-efficient RL algorithm than the one that generated the source data.