使用算法蒸馏的文本增强学习

论文标题

使用算法蒸馏的文本增强学习

In-context Reinforcement Learning with Algorithm Distillation

论文作者

Laskin, Michael, Wang, Luyu, Oh, Junhyuk, Parisotto, Emilio, Spencer, Stephen, Steigerwald, Richie, Strouse, DJ, Hansen, Steven, Filos, Angelos, Brooks, Ethan, Gazeau, Maxime, Sahni, Himanshu, Singh, Satinder, Mnih, Volodymyr

论文摘要

我们提出了算法蒸馏（AD），这是一种通过因果序列模型对其训练历史进行建模，将加固学习（RL）算法蒸馏到神经网络中。算法蒸馏将学习作为跨序列的顺序预测问题进行学习。学习历史的数据集由源RL算法生成，然后通过自动加压预测的动作来训练因果变压器，鉴于其先前的学习历史作为上下文。与提炼后学习或专家序列的顺序策略预测不同，AD能够完全在不更新其网络参数的情况下将其策略完全改进。我们证明，在具有稀疏奖励，组合任务结构和基于像素的观测值的各种环境中，AD可以强化学习中文，并发现AD比生成源数据的AD学习更高的数据效率RL算法。

We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction problem. A dataset of learning histories is generated by a source RL algorithm, and then a causal transformer is trained by autoregressively predicting actions given their preceding learning histories as context. Unlike sequential policy prediction architectures that distill post-learning or expert sequences, AD is able to improve its policy entirely in-context without updating its network parameters. We demonstrate that AD can reinforcement learn in-context in a variety of environments with sparse rewards, combinatorial task structure, and pixel-based observations, and find that AD learns a more data-efficient RL algorithm than the one that generated the source data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题