论文标题

马尔可夫决策过程的主动元学习方法

A Markov Decision Process Approach to Active Meta Learning

论文作者

Wang, Bingjia, Koppel, Alec, Krishnamurthy, Vikram

论文摘要

在监督的学习中,我们假设数据与单一任务相关联,我们将单个统计模型拟合到给定的数据集,该任务与特定使用的模型相关联,但不能很好地适应新的环境。相比之下,在元学习中,数据与许多任务相关联,我们寻求一个模型,可以同时在所有任务上执行良好的任务,以追求更大的概括。元学习中的一个挑战是如何利用任务与类之间的关系,这被通用的随机或循环通过数据所忽略。在这项工作中,我们建议积极选择样品,通过辨别元训练集内和之间的协变量来训练。具体来说,我们提出了从许多元训练集中选择样本作为多臂强盗或马尔可夫决策过程(MDP)的问题,具体取决于一个人如何封装跨任务的相关性。我们基于上限置信度结合(UCB),Gittins索引和Markov决策问题(MDP)开发计划方案,其中使用线性编程解决了,其中奖励是缩放的统计准确性,以确保它是状态和动作的时间不变函数。在各种实验环境中,我们观察到相对于循环或i.i.d.的主动选择方案样品复杂性的显着降低。取样,证明在实践中利用协变量的优点。

In supervised learning, we fit a single statistical model to a given data set, assuming that the data is associated with a singular task, which yields well-tuned models for specific use, but does not adapt well to new contexts. By contrast, in meta-learning, the data is associated with numerous tasks, and we seek a model that may perform well on all tasks simultaneously, in pursuit of greater generalization. One challenge in meta-learning is how to exploit relationships between tasks and classes, which is overlooked by commonly used random or cyclic passes through data. In this work, we propose actively selecting samples on which to train by discerning covariates inside and between meta-training sets. Specifically, we cast the problem of selecting a sample from a number of meta-training sets as either a multi-armed bandit or a Markov Decision Process (MDP), depending on how one encapsulates correlation across tasks. We develop scheduling schemes based on Upper Confidence Bound (UCB), Gittins Index and tabular Markov Decision Problems (MDPs) solved with linear programming, where the reward is the scaled statistical accuracy to ensure it is a time-invariant function of state and action. Across a variety of experimental contexts, we observe significant reductions in sample complexity of active selection scheme relative to cyclic or i.i.d. sampling, demonstrating the merit of exploiting covariates in practice.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源