论文标题
在大动作空间中的非线性增强学习:结构条件和后取样的样本效率
Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-efficiency of Posterior Sampling
论文作者
论文摘要
可证明的样本有效的增强学习(RL)具有丰富的观察结果和功能近似,已经见证了最近的巨大进展,尤其是当基础函数近似值是线性时。在这种线性方案中,存在计算和统计上有效的方法,其中可以通过已知的特征嵌入来捕获潜在的无限状态和动作空间,并具有这些特征的(固有)维度的样品复杂度缩放。当动作空间是有限的时,明显更复杂的结果允许在基础RL问题上适当的结构约束下进行非线性函数近似,例如,学习良好功能而不是假设访问它们。在这项工作中,我们介绍了非线性函数近似值的第一个结果,该近似在线性嵌入性条件下适用于通用作用空间,该条件概括了所有线性和有限的动作设置。我们为此类问题设计了一种新颖的乐观后采样策略,TS^3,并显示出最坏的情况样本复杂性可确保使用RL问题的等级参数,该工作中引入了线性嵌入维度以及功能类别复杂性的标准测量。
Provably sample-efficient Reinforcement Learning (RL) with rich observations and function approximation has witnessed tremendous recent progress, particularly when the underlying function approximators are linear. In this linear regime, computationally and statistically efficient methods exist where the potentially infinite state and action spaces can be captured through a known feature embedding, with the sample complexity scaling with the (intrinsic) dimension of these features. When the action space is finite, significantly more sophisticated results allow non-linear function approximation under appropriate structural constraints on the underlying RL problem, permitting for instance, the learning of good features instead of assuming access to them. In this work, we present the first result for non-linear function approximation which holds for general action spaces under a linear embeddability condition, which generalizes all linear and finite action settings. We design a novel optimistic posterior sampling strategy, TS^3 for such problems, and show worst case sample complexity guarantees that scale with a rank parameter of the RL problem, the linear embedding dimension introduced in this work and standard measures of the function class complexity.