论文标题
深度创新保护:在培训异质神经体系结构中面对信用分配问题
Deep Innovation Protection: Confronting the Credit Assignment Problem in Training Heterogeneous Neural Architectures
论文作者
论文摘要
深厚的增强学习方法在各种不同的领域中显示出令人印象深刻的结果,但是,更复杂的异质体系结构(例如世界模型)需要单独训练不同的神经成分,而不是端到端。尽管最近可能显示出一种简单的遗传算法端到端训练,但它未能解决更复杂的3D任务。本文提出了一种称为“深度创新保护(DIP)”的方法,该方法解决了培训此类环境中端到端的复杂异质神经网络模型中的信用分配问题。该方法背后的主要思想是采用多目标优化来暂时减少多组分网络中特定组件的选择压力,从而使其他组件适应。我们研究了这些进化的网络的紧急表示,这些网络学会预测对代理生存重要的特性,而无需特定的前瞻性损失。
Deep reinforcement learning approaches have shown impressive results in a variety of different domains, however, more complex heterogeneous architectures such as world models require the different neural components to be trained separately instead of end-to-end. While a simple genetic algorithm recently showed end-to-end training is possible, it failed to solve a more complex 3D task. This paper presents a method called Deep Innovation Protection (DIP) that addresses the credit assignment problem in training complex heterogenous neural network models end-to-end for such environments. The main idea behind the approach is to employ multiobjective optimization to temporally reduce the selection pressure on specific components in multi-component network, allowing other components to adapt. We investigate the emergent representations of these evolved networks, which learn to predict properties important for the survival of the agent, without the need for a specific forward-prediction loss.