论文标题
对未知的协变量偏移的自我调查
Self-Tuning Bandits over Unknown Covariate-Shifts
论文作者
论文摘要
带有协变量的土匪,又称上下文匪徒,地址情况,在给定时间$ t $的最佳操作(或武器)的情况下,取决于上下文$ x_t $,例如,新患者的病史,消费者的过去购买。虽然可以理解,上下文的分布可能会随着时间的流逝而变化,例如,由于季节性或部署到新环境,但大多数研究涉及这种最对抗性的这种变化,从而导致遗憾的界限通常是最糟糕的。 另一方面,在分类中将协变量转移视为一种中间地形式主义,可以捕获分布的轻度到相对严重的变化。我们考虑在这种中间场景下的非参数匪徒,并得出了新的遗憾界限,这些范围紧密地捕获了上下文分布的连续性变化。此外,我们表明这些速率可以自适应地达到,而无需了解转移时间或转移量。
Bandits with covariates, a.k.a. contextual bandits, address situations where optimal actions (or arms) at a given time $t$, depend on a context $x_t$, e.g., a new patient's medical history, a consumer's past purchases. While it is understood that the distribution of contexts might change over time, e.g., due to seasonalities, or deployment to new environments, the bulk of studies concern the most adversarial such changes, resulting in regret bounds that are often worst-case in nature. Covariate-shift on the other hand has been considered in classification as a middle-ground formalism that can capture mild to relatively severe changes in distributions. We consider nonparametric bandits under such middle-ground scenarios, and derive new regret bounds that tightly capture a continuum of changes in context distribution. Furthermore, we show that these rates can be adaptively attained without knowledge of the time of shift nor the amount of shift.