论文标题
流行音乐:政策修剪和缩小,以深入加强学习
PoPS: Policy Pruning and Shrinking for Deep Reinforcement Learning
论文作者
论文摘要
深度神经网络(DNNS)在强化学习中的功能近似的最新成功触发了在各个领域(例如机器人技术,计算机游戏,自然语言处理,计算机视觉,传感系统和无线网络)中深层增强学习(DRL)算法的发展。不幸的是,DNN遭受了高计算成本和内存消耗的困扰,这限制了硬件资源有限的系统中DRL算法的使用。近年来,修剪算法在减少DNN在分类任务中的冗余方面取得了巨大成功。但是,现有算法遭受了DRL域的大幅度降低。在本文中,我们开发了第一个有效的解决方案来解决DRL域中修剪性能的降低问题,并建立一种工作算法,称为策略修剪和缩小(POPS),以训练DRL模型具有强大的性能,同时实现DNN的紧凑表示。该框架基于一种新型的迭代政策修剪和缩小方法,该方法利用训练DRL模型时利用转移学习的力量。我们提出了一项广泛的实验研究,该研究表明了使用流行的Cartpole,Lunar Lander,Pong和Pacman环境的POP表现出色。最后,我们开发了一个开源软件,以利用相关领域的研究人员和开发人员的利益。
The recent success of deep neural networks (DNNs) for function approximation in reinforcement learning has triggered the development of Deep Reinforcement Learning (DRL) algorithms in various fields, such as robotics, computer games, natural language processing, computer vision, sensing systems, and wireless networking. Unfortunately, DNNs suffer from high computational cost and memory consumption, which limits the use of DRL algorithms in systems with limited hardware resources. In recent years, pruning algorithms have demonstrated considerable success in reducing the redundancy of DNNs in classification tasks. However, existing algorithms suffer from a significant performance reduction in the DRL domain. In this paper, we develop the first effective solution to the performance reduction problem of pruning in the DRL domain, and establish a working algorithm, named Policy Pruning and Shrinking (PoPS), to train DRL models with strong performance while achieving a compact representation of the DNN. The framework is based on a novel iterative policy pruning and shrinking method that leverages the power of transfer learning when training the DRL model. We present an extensive experimental study that demonstrates the strong performance of PoPS using the popular Cartpole, Lunar Lander, Pong, and Pacman environments. Finally, we develop an open source software for the benefit of researchers and developers in related fields.