论文标题
马尔可夫决策过程有限的张量网络方法
A Tensor Network Approach to Finite Markov Decision Processes
论文作者
论文摘要
张量网络(TN)技术(通常用于量子多体物理学的背景)已显示出有望作为解决机器学习(ML)问题的工具。但是,TNS在ML上的应用主要集中在监督和无监督的学习上。然而,凭借与隐藏的马尔可夫连锁店的直接联系,TNS也自然适合马尔可夫决策过程(MDP),这些过程为增强学习提供了基础(RL)。在这里,我们介绍了有限,情节和离散MDP的一般TN公式。我们展示了这种公式如何利用为TNS开发的算法进行策略优化,即RL的主要目的。作为一个应用程序,我们将问题(公式为RL问题)视为找到满足特定动态条件的随机演变的问题,使用随机步行偏移的简单示例作为例证。
Tensor network (TN) techniques - often used in the context of quantum many-body physics - have shown promise as a tool for tackling machine learning (ML) problems. The application of TNs to ML, however, has mostly focused on supervised and unsupervised learning. Yet, with their direct connection to hidden Markov chains, TNs are also naturally suited to Markov decision processes (MDPs) which provide the foundation for reinforcement learning (RL). Here we introduce a general TN formulation of finite, episodic and discrete MDPs. We show how this formulation allows us to exploit algorithms developed for TNs for policy optimisation, the key aim of RL. As an application we consider the issue - formulated as an RL problem - of finding a stochastic evolution that satisfies specific dynamical conditions, using the simple example of random walk excursions as an illustration.