论文标题
隐式参数抑制了数据有效的深度增强学习
Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning
论文作者
论文摘要
我们在基于价值的深度RL方法中确定一种隐式参数化现象,该方法使用自举:当使用深神经网络近似值的值函数时,使用梯度下降使用迭代的回归训练,使用迭代的回归使用该值网络的先前实例产生的目标值,则更多的梯度更新降低了当前价值网络的表达性。我们通过降低学习值网络特征的排名来表征这种表达性丧失,并表明这通常与性能下降相对应。我们在离线和在线RL设置上都在Atari和Gym基准上演示了这种现象。我们正式分析了这一现象,并表明它是由于自举和基于梯度的优化之间的病理相互作用而引起的。我们进一步表明,通过控制等级崩溃来缓解隐式参数化不足可以提高性能。
We identify an implicit under-parameterization phenomenon in value-based deep RL methods that use bootstrapping: when value functions, approximated using deep neural networks, are trained with gradient descent using iterated regression onto target values generated by previous instances of the value network, more gradient updates decrease the expressivity of the current value network. We characterize this loss of expressivity via a drop in the rank of the learned value network features, and show that this typically corresponds to a performance drop. We demonstrate this phenomenon on Atari and Gym benchmarks, in both offline and online RL settings. We formally analyze this phenomenon and show that it results from a pathological interaction between bootstrapping and gradient-based optimization. We further show that mitigating implicit under-parameterization by controlling rank collapse can improve performance.