分批固定分配估算

论文标题

分批固定分配估算

Batch Stationary Distribution Estimation

论文作者

Wen, Junfeng, Dai, Bo, Li, Lihong, Schuurmans, Dale

论文摘要

我们考虑了在一组采样过渡的情况下，我们考虑了近似值马尔可夫链的固定分布的问题。基于经典模拟的方法假设访问基础过程，以便可以收集足够长的轨迹以近似固定抽样。取而代之的是，我们考虑了一种替代设置，其中已事先通过单独的，可能是未知的过程收集了固定的过渡。目标仍然是估计固定分布的属性，但没有对基础系统的额外访问。我们提出了一个一致的估计器，该估计器基于恢复给定数据的校正比函数。特别是，我们开发了一种差异功率方法（VPM），该方法在一般条件下提供了可证明一致的估计值。除了统一不同的子字段现有方法外，我们还发现，VPM在包括排队，随机微分方程，后加工MCMC和外部评估在内的一系列问题中产生的估计值明显更好。

We consider the problem of approximating the stationary distribution of an ergodic Markov chain given a set of sampled transitions. Classical simulation-based approaches assume access to the underlying process so that trajectories of sufficient length can be gathered to approximate stationary sampling. Instead, we consider an alternative setting where a fixed set of transitions has been collected beforehand, by a separate, possibly unknown procedure. The goal is still to estimate properties of the stationary distribution, but without additional access to the underlying system. We propose a consistent estimator that is based on recovering a correction ratio function over the given data. In particular, we develop a variational power method (VPM) that provides provably consistent estimates under general conditions. In addition to unifying a number of existing approaches from different subfields, we also find that VPM yields significantly better estimates across a range of problems, including queueing, stochastic differential equations, post-processing MCMC, and off-policy evaluation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题