论文标题
具有深入强化学习的规范性迪里奇电力分配政策
A Prescriptive Dirichlet Power Allocation Policy with Deep Reinforcement Learning
论文作者
论文摘要
根据系统的状况规定最佳操作,从而可能延长剩余的有用寿命具有积极管理复杂系统的可用性,维护和成本的巨大潜力。鉴于其学习能力,增强学习(RL)算法特别适合这种类型的问题。规范操作的一种特殊情况是功率分配任务,可以将其视为顺序分配问题,其中动作空间由单纯形约束界定。此类顺序分配问题的一般连续动作空间解决方案仍然是RL算法的开放研究问题。在连续的动作空间中,应用于加强学习的标准高斯政策不支持单纯限制,而高斯 - 富马克斯政策在培训期间引入了偏见。在这项工作中,我们提出了持续分配任务的Dirichlet政策,并分析其政策梯度的偏见和差异。我们证明,Dirichlet策略是无偏见的,并且在高斯 - 富马克斯政策上提供了更快的融合,更好的性能和更好的超参数鲁棒性。此外,我们证明了所提出的算法在规定操作案例上的适用性,我们提出了Dirichlet Power分配策略,并在一组多个锂离子(LI-I)电池系统的案例研究中评估了绩效。实验结果表明,有可能开出最佳操作,提高多功能源系统的效率和可持续性。
Prescribing optimal operation based on the condition of the system and, thereby, potentially prolonging the remaining useful lifetime has a large potential for actively managing the availability, maintenance and costs of complex systems. Reinforcement learning (RL) algorithms are particularly suitable for this type of problems given their learning capabilities. A special case of a prescriptive operation is the power allocation task, which can be considered as a sequential allocation problem, where the action space is bounded by a simplex constraint. A general continuous action-space solution of such sequential allocation problems has still remained an open research question for RL algorithms. In continuous action-space, the standard Gaussian policy applied in reinforcement learning does not support simplex constraints, while the Gaussian-softmax policy introduces a bias during training. In this work, we propose the Dirichlet policy for continuous allocation tasks and analyze the bias and variance of its policy gradients. We demonstrate that the Dirichlet policy is bias-free and provides significantly faster convergence, better performance and better hyperparameters robustness over the Gaussian-softmax policy. Moreover, we demonstrate the applicability of the proposed algorithm on a prescriptive operation case, where we propose the Dirichlet power allocation policy and evaluate the performance on a case study of a set of multiple lithium-ion (Li-I) battery systems. The experimental results show the potential to prescribe optimal operation, improve the efficiency and sustainability of multi-power source systems.