论文标题
数据生命周期管理在不断发展的基于学习的航空航天应用的输入分布
Data Lifecycle Management in Evolving Input Distributions for Learning-based Aerospace Applications
论文作者
论文摘要
随着输入分布在任务寿命中的发展,保持基于学习的模型的性能变得具有挑战性。本文提出了一个框架,通过选择标签的测试输入子集来逐步重新训练模型,这允许模型适应更改输入分布。根据(1)整个任务寿命的模型性能以及(2)与标签和模型再培训相关的累积成本,对此框架中的算法进行了评估。我们提供了卫星姿势估计模型的开源基准测试,该卫星姿势估计模型在空间中的卫星图像上训练并部署在新颖场景中(例如,不同的背景或不良行为像素),在其中评估算法通过在输入子的子集中对其进行高性能维持高性能的能力进行评估。我们还提出了一种新颖的算法,以使用贝叶斯不确定性量化从输入中表征信息获得的信息增益,并选择一个子集,并选择使用批处理主动学习中的概念最大化集体信息增益,从而选择了标记的各种子集。我们表明,我们的算法在基准上的表现优于其他算法,例如,实现可比的性能与100%输入标记的算法相当,而仅标记了50%的输入,从而在任务寿命中产生了低成本和高性能。
As input distributions evolve over a mission lifetime, maintaining performance of learning-based models becomes challenging. This paper presents a framework to incrementally retrain a model by selecting a subset of test inputs to label, which allows the model to adapt to changing input distributions. Algorithms within this framework are evaluated based on (1) model performance throughout mission lifetime and (2) cumulative costs associated with labeling and model retraining. We provide an open-source benchmark of a satellite pose estimation model trained on images of a satellite in space and deployed in novel scenarios (e.g., different backgrounds or misbehaving pixels), where algorithms are evaluated on their ability to maintain high performance by retraining on a subset of inputs. We also propose a novel algorithm to select a diverse subset of inputs for labeling, by characterizing the information gain from an input using Bayesian uncertainty quantification and choosing a subset that maximizes collective information gain using concepts from batch active learning. We show that our algorithm outperforms others on the benchmark, e.g., achieves comparable performance to an algorithm that labels 100% of inputs, while only labeling 50% of inputs, resulting in low costs and high performance over the mission lifetime.