通过舞台激励机制进行机器人轨迹计划的密集奖励机制进行深入的强化学习

论文标题

通过舞台激励机制进行机器人轨迹计划的密集奖励机制进行深入的强化学习

Deep Reinforcement Learning with a Stage Incentive Mechanism of Dense Reward for Robotic Trajectory Planning

论文作者

Peng, Gang, Yang, Jin, Lia, Xinde, Khyam, Mohammad Omar

论文摘要

（这项工作已提交给IEEE以获取可能的出版物。版权可以在恕不另行通知的情况下转移，此后不再可以访问此版本。）为了提高深钢筋学习的效率（DRL）在随机工作环境中基于机器人操纵器轨迹计划的方法，我们提出了三个密集的奖励功能。这些奖励与传统的稀疏奖励不同。首先，提出了一个姿势奖励功能，以通过对距离和方向约束进行建模，以更合理的轨迹加快学习过程，这可以减少探索的失明。其次，提出了步幅奖励函数，以通过对关节约束的距离和运动距离进行建模，以提高学习过程的稳定性。最后，为了进一步提高学习效率，我们受到人类行为的认知过程的启发，并提出了阶段激励机制，包括硬舞台激励奖励功能和软阶段的激励奖励功能。广泛的实验表明，软阶段激励奖励功能能够通过最先进的DRL方法提高收敛速度高达46.9％。收敛平均奖励的百分比为4.4-15.5％，相对于标准偏差的百分比下降为21.9-63.2％。在评估实验中，机器人操纵器的轨迹计划的成功率达到99.6％。

(This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.) To improve the efficiency of deep reinforcement learning (DRL)-based methods for robot manipulator trajectory planning in random working environments, we present three dense reward functions. These rewards differ from the traditional sparse reward. First, a posture reward function is proposed to speed up the learning process with a more reasonable trajectory by modeling the distance and direction constraints, which can reduce the blindness of exploration. Second, a stride reward function is proposed to improve the stability of the learning process by modeling the distance and movement distance of joint constraints. Finally, in order to further improve learning efficiency, we are inspired by the cognitive process of human behavior and propose a stage incentive mechanism, including a hard stage incentive reward function and a soft stage incentive reward function. Extensive experiments show that the soft stage incentive reward function is able to improve the convergence rate by up to 46.9% with the state-of-the-art DRL methods. The percentage increase in the convergence mean reward was 4.4-15.5% and the percentage decreases with respect to standard deviation were 21.9-63.2%. In the evaluation experiments, the success rate of trajectory planning for a robot manipulator reached 99.6%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题