蒙特卡洛树搜索有效的动态抽样政策

论文标题

蒙特卡洛树搜索有效的动态抽样政策

An Efficient Dynamic Sampling Policy For Monte Carlo Tree Search

论文作者

Zhang, Gongbo, Peng, Yijie, Xu, Yilong

论文摘要

在有限的 - 马尔可夫决策过程的背景下，我们在强化学习，蒙特卡洛树搜索（MCT）的框架内考虑了流行的基于树的搜索策略。我们提出了一个动态采样树策略，该策略有效地分配了有限的计算预算，以最大程度地提高了在树的根节点上正确选择最佳动作的可能性。 TIC-TAC和GOMOKU的实验结果表明，所提出的树木政策比其他竞争方法更有效。

We consider the popular tree-based search strategy within the framework of reinforcement learning, the Monte Carlo Tree Search (MCTS), in the context of finite-horizon Markov decision process. We propose a dynamic sampling tree policy that efficiently allocates limited computational budget to maximize the probability of correct selection of the best action at the root node of the tree. Experimental results on Tic-Tac-Toe and Gomoku show that the proposed tree policy is more efficient than other competing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题