S2P：脱机强化学习中数据增强的国家条件图像合成

论文标题

S2P：脱机强化学习中数据增强的国家条件图像合成

S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning

论文作者

Cho, Daesol, Shim, Dongseok, Kim, H. Jin

论文摘要

离线增强学习（离线RL）遭受了天生的分配变化，因为它在训练过程中无法与物理环境相互作用。为了减轻这种限制，基于州的离线RL从记录的经验中利用了学习的动态模型，并增强了预测的状态过渡以扩展数据分布。为了利用基于图像的RL的这种好处，我们首先提出了一个生成模型S2P（state2pixel），该模型从其相应的状态中综合了代理的原始像素。它可以在RL算法中桥接状态与图像域之间的差距，并通过在状态空间中的基于模型的过渡探索看不见的图像分布。通过实验，我们确认我们的基于S2P的图像合成不仅可以改善基于图像的离线RL性能，而且还显示出对看不见的任务的强大概括能力。

Offline reinforcement learning (Offline RL) suffers from the innate distributional shift as it cannot interact with the physical environment during training. To alleviate such limitation, state-based offline RL leverages a learned dynamics model from the logged experience and augments the predicted state transition to extend the data distribution. For exploiting such benefit also on the image-based RL, we firstly propose a generative model, S2P (State2Pixel), which synthesizes the raw pixel of the agent from its corresponding state. It enables bridging the gap between the state and the image domain in RL algorithms, and virtually exploring unseen image distribution via model-based transition in the state space. Through experiments, we confirm that our S2P-based image synthesis not only improves the image-based offline RL performance but also shows powerful generalization capability on unseen tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题