SEERL：样品有效的集合加固学习

论文标题

SEERL：样品有效的集合加固学习

SEERL: Sample Efficient Ensemble Reinforcement Learning

论文作者

Saphal, Rohan, Ravindran, Balaraman, Mudigere, Dheevatsa, Avancha, Sasikanth, Kaul, Bharat

论文摘要

合奏学习是机器学习中采用的一种非常普遍的方法。合奏方法的相对成功归因于它们解决需要不同低级方法的广泛实例和复杂问题的能力。但是，由于获得多种多样的合奏所涉及的高样本复杂性和计算费用，合奏方法在增强学习中相对流行。我们为无模型增强算法提供了一种新颖的培训和模型选择框架，该算法使用了从单个训练运行中获得的政策集合。这些策略本质上是多种多样的，可以通过定期的模型参数的定向扰动来学习。我们表明，需要学习和选择适当多样化的政策，这是一个良好的合奏，而极端多样性可能对整体表现有害。通过我们的新型政策选择框架来完成一套充分多样化的政策。我们评估我们在挑战离散和持续控制任务的方法上，并讨论各种结合策略。我们的框架基本上是有效的，计算便宜的，并且可以在Atari 2600和Mujoco中优于最先进（SOTA）得分。

Ensemble learning is a very prevalent method employed in machine learning. The relative success of ensemble methods is attributed to their ability to tackle a wide range of instances and complex problems that require different low-level approaches. However, ensemble methods are relatively less popular in reinforcement learning owing to the high sample complexity and computational expense involved in obtaining a diverse ensemble. We present a novel training and model selection framework for model-free reinforcement algorithms that use ensembles of policies obtained from a single training run. These policies are diverse in nature and are learned through directed perturbation of the model parameters at regular intervals. We show that learning and selecting an adequately diverse set of policies is required for a good ensemble while extreme diversity can prove detrimental to overall performance. Selection of an adequately diverse set of policies is done through our novel policy selection framework. We evaluate our approach on challenging discrete and continuous control tasks and also discuss various ensembling strategies. Our framework is substantially sample efficient, computationally inexpensive and is seen to outperform state-of-the-art (SOTA) scores in Atari 2600 and Mujoco.

下载PDF全文

下载文献需遵守相关版权规定

论文标题