论文标题
汤普森在不对称$α$稳定的土匪上抽样
Thompson Sampling on Asymmetric $α$-Stable Bandits
论文作者
论文摘要
在增强学习中的算法优化中,如何应对探索 - 开发困境尤为重要。多武器的匪徒问题可以通过更改奖励分布来实现勘探和剥削之间的动态平衡来优化所提出的解决方案。汤普森采样是解决多军匪徒问题的常见方法,已用于探索符合各种定律的数据。在本文中,我们考虑了用于多军匪徒问题的汤普森采样方法,其中奖励符合未知的不对称$α$稳定分布,并探索了他们在对财务和无线数据进行建模时的应用。
In algorithm optimization in reinforcement learning, how to deal with the exploration-exploitation dilemma is particularly important. Multi-armed bandit problem can optimize the proposed solutions by changing the reward distribution to realize the dynamic balance between exploration and exploitation. Thompson Sampling is a common method for solving multi-armed bandit problem and has been used to explore data that conform to various laws. In this paper, we consider the Thompson Sampling approach for multi-armed bandit problem, in which rewards conform to unknown asymmetric $α$-stable distributions and explore their applications in modelling financial and wireless data.