高维马尔可夫决策过程中的专家选择

论文标题

高维马尔可夫决策过程中的专家选择

Expert Selection in High-Dimensional Markov Decision Processes

论文作者

Rubies-Royo, Vicenc, Mazumdar, Eric, Dong, Roy, Tomlin, Claire, Sastry, S. Shankar

论文摘要

在这项工作中，我们介绍了马尔可夫决策过程中的在线专家选择的多军强盗框架，并证明了其在高维度中的使用。我们的方法采用了一系列候选专家政策，并在它们之间进行切换，以快速使用经典上限置信度结合算法的变体迅速识别出最佳性能的专家，从而确保对系统整体性能的遗憾。这在可能可用的几个专家政策的应用程序中很有用，并且需要在运行时选择一个针对基础环境。

In this work we present a multi-armed bandit framework for online expert selection in Markov decision processes and demonstrate its use in high-dimensional settings. Our method takes a set of candidate expert policies and switches between them to rapidly identify the best performing expert using a variant of the classical upper confidence bound algorithm, thus ensuring low regret in the overall performance of the system. This is useful in applications where several expert policies may be available, and one needs to be selected at run-time for the underlying environment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题