检查强化学习剂的政策熵，以进行个性化任务

论文标题

检查强化学习剂的政策熵，以进行个性化任务

Examining Policy Entropy of Reinforcement Learning Agents for Personalization Tasks

论文作者

Dereventsov, Anton, Starnes, Andrew, Webster, Clayton G.

论文摘要

这项工作的重点是研究个性化环境中强化学习系统的行为，并详细介绍与所使用的学习算法类型相关的策略熵的差异。我们证明，政策优化代理在培训过程中通常具有低渗透性政策，实际上，这会导致对某些行动并避免其他行动的代理人。相反，我们还表明，Q学习剂不太容易受到这种行为的影响，并且通常在整个培训中保持高渗透策略，这在现实世界中通常是可取的。我们提供了广泛的数值实验以及理论上的理由，以表明熵的这些差异是由于所采用的学习类型所致。

This effort is focused on examining the behavior of reinforcement learning systems in personalization environments and detailing the differences in policy entropy associated with the type of learning algorithm utilized. We demonstrate that Policy Optimization agents often possess low-entropy policies during training, which in practice results in agents prioritizing certain actions and avoiding others. Conversely, we also show that Q-Learning agents are far less susceptible to such behavior and generally maintain high-entropy policies throughout training, which is often preferable in real-world applications. We provide a wide range of numerical experiments as well as theoretical justification to show that these differences in entropy are due to the type of learning being employed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题