批处理的推断

论文标题

批处理的推断

Inference for Batched Bandits

论文作者

Zhang, Kelly W., Janson, Lucas, Murphy, Susan A.

论文摘要

随着匪徒算法在科学研究和工业应用中越来越多地使用，基于所得的自适应收集的数据，对可靠推理方法的需求增加了。在这项工作中，我们开发了使用强盗算法在批处理中收集的数据的推断方法。我们首先证明，在没有独特的最佳臂时，使用标准的匪徒在使用标准匪徒时收集的数据，在独立采样数据上渐近地正常的普通最小二乘估计器（OLS）并不是渐近的正常。这种渐近的非正常性结果表明，OLS估计量大致正常的天真假设可能导致1型误差通胀和置信区间，并且置信区间低于新月形。其次，我们介绍了批处理的OLS估计量（BOL），我们证明是（1）在基线奖励中从多臂和上下文匪徒收集的数据中均不正常，以及（2）在基线奖励中稳健到非平稳性。

As bandit algorithms are increasingly utilized in scientific studies and industrial applications, there is an associated increasing need for reliable inference methods based on the resulting adaptively-collected data. In this work, we develop methods for inference on data collected in batches using a bandit algorithm. We first prove that the ordinary least squares estimator (OLS), which is asymptotically normal on independently sampled data, is not asymptotically normal on data collected using standard bandit algorithms when there is no unique optimal arm. This asymptotic non-normality result implies that the naive assumption that the OLS estimator is approximately normal can lead to Type-1 error inflation and confidence intervals with below-nominal coverage probabilities. Second, we introduce the Batched OLS estimator (BOLS) that we prove is (1) asymptotically normal on data collected from both multi-arm and contextual bandits and (2) robust to non-stationarity in the baseline reward.

下载PDF全文

下载文献需遵守相关版权规定

论文标题