不要触摸重要的事情：用于视觉增强学习的任务吸引Lipschitz数据增强

论文标题

不要触摸重要的事情：用于视觉增强学习的任务吸引Lipschitz数据增强

Don't Touch What Matters: Task-Aware Lipschitz Data Augmentation for Visual Reinforcement Learning

论文作者

Yuan, Zhecheng, Ma, Guozheng, Mu, Yao, Xia, Bo, Yuan, Bo, Wang, Xueqian, Luo, Ping, Xu, Huazhe

论文摘要

视觉增强学习（RL）的关键挑战之一是学习可以推广到看不见环境的政策。最近，旨在增强数据多样性的数据增强技术已证明在提高学习策略的概括能力方面已证明了绩效。但是，由于RL训练的敏感性，天真地应用数据增强，从而以任务不可能的方式改变了每个像素，可能会遭受不稳定性和损害样品效率的损害，从而进一步加剧了概括性能。这一现象的核心是面对增强图像的动作分布和高变化的价值估计。为了减轻此问题，我们建议使用“ Visual RL”任务吸引Lipschitz数据增强（TLDA），该数据可明确识别具有大Lipschitz常数与任务相关的像素，并且仅增强了任务 - iRrelexreleverelevervant像素。为了验证TLDA的有效性，我们对DeepMind Control Suite，Carla和DeepMind操纵任务进行了广泛的实验，表明TLDA提高了训练时间的样本效率和测试时间的概括。它的表现优于3个不同的视觉控制基准的先前最新方法。

One of the key challenges in visual Reinforcement Learning (RL) is to learn policies that can generalize to unseen environments. Recently, data augmentation techniques aiming at enhancing data diversity have demonstrated proven performance in improving the generalization ability of learned policies. However, due to the sensitivity of RL training, naively applying data augmentation, which transforms each pixel in a task-agnostic manner, may suffer from instability and damage the sample efficiency, thus further exacerbating the generalization performance. At the heart of this phenomenon is the diverged action distribution and high-variance value estimation in the face of augmented images. To alleviate this issue, we propose Task-aware Lipschitz Data Augmentation (TLDA) for visual RL, which explicitly identifies the task-correlated pixels with large Lipschitz constants, and only augments the task-irrelevant pixels. To verify the effectiveness of TLDA, we conduct extensive experiments on DeepMind Control suite, CARLA and DeepMind Manipulation tasks, showing that TLDA improves both sample efficiency in training time and generalization in test time. It outperforms previous state-of-the-art methods across the 3 different visual control benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题