通过经验伯恩斯坦不等式对UCRL2的分析改进

论文标题

通过经验伯恩斯坦不等式对UCRL2的分析改进

Improved Analysis of UCRL2 with Empirical Bernstein Inequality

论文作者

Fruit, Ronan, Pirotta, Matteo, Lazaric, Alessandro

论文摘要

我们考虑在传达马尔可夫决策过程中探索探索的问题。我们提供了用经验的伯恩斯坦不等式的UCRL2分析（UCRL2B）。对于任何具有$ S $状态的MDP，$ a $ actions，$γ\ leq s $下一个状态和直径$ d $，ucrl2b的遗憾被视为$ \ widetilde {o}（\ sqrt {dγSa t}）$。

We consider the problem of exploration-exploitation in communicating Markov Decision Processes. We provide an analysis of UCRL2 with Empirical Bernstein inequalities (UCRL2B). For any MDP with $S$ states, $A$ actions, $Γ\leq S$ next states and diameter $D$, the regret of UCRL2B is bounded as $\widetilde{O}(\sqrt{DΓS A T})$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题