论文标题
通过经验伯恩斯坦不等式对UCRL2的分析改进
Improved Analysis of UCRL2 with Empirical Bernstein Inequality
论文作者
论文摘要
我们考虑在传达马尔可夫决策过程中探索探索的问题。我们提供了用经验的伯恩斯坦不等式的UCRL2分析(UCRL2B)。对于任何具有$ S $状态的MDP,$ a $ actions,$γ\ leq s $下一个状态和直径$ d $,ucrl2b的遗憾被视为$ \ widetilde {o}(\ sqrt {dγSa t})$。
We consider the problem of exploration-exploitation in communicating Markov Decision Processes. We provide an analysis of UCRL2 with Empirical Bernstein inequalities (UCRL2B). For any MDP with $S$ states, $A$ actions, $Γ\leq S$ next states and diameter $D$, the regret of UCRL2B is bounded as $\widetilde{O}(\sqrt{DΓS A T})$.