DNS：基于确定点过程的神经网络采样器，用于集合增强学习

论文标题

DNS：基于确定点过程的神经网络采样器，用于集合增强学习

DNS: Determinantal Point Process Based Neural Network Sampler for Ensemble Reinforcement Learning

论文作者

Sheikh, Hassam, Frisbee, Kizza, Phielipp, Mariano

论文摘要

神经网络合奏的应用正在成为迫在推进深度强化学习算法中最先进的工具。但是，在整体中培训这些大量的神经网络的计算成本极高，这可能会成为训练大规模系统的障碍。在本文中，我们提出了DNS：基于确定点过程的神经网络采样器，该采样器专门使用K-DPP在每个训练步骤中对神经网络的一个子集进行样品进行反向传播，从而大大降低了训练时间和计算成本。我们在REDQ中集成了DNS以进行连续控制任务，并在Mujoco环境上进行了评估。我们的实验表明，在平均累积奖励方面，DNS增强REDQ优于基线REDQ，并且在拖船中测量时，使用少于50％的计算来实现这一点。

Application of ensemble of neural networks is becoming an imminent tool for advancing the state-of-the-art in deep reinforcement learning algorithms. However, training these large numbers of neural networks in the ensemble has an exceedingly high computation cost which may become a hindrance in training large-scale systems. In this paper, we propose DNS: a Determinantal Point Process based Neural Network Sampler that specifically uses k-dpp to sample a subset of neural networks for backpropagation at every training step thus significantly reducing the training time and computation cost. We integrated DNS in REDQ for continuous control tasks and evaluated on MuJoCo environments. Our experiments show that DNS augmented REDQ outperforms baseline REDQ in terms of average cumulative reward and achieves this using less than 50% computation when measured in FLOPS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题