论文标题
深度强化学习中的基于不确定性的分布分类
Uncertainty-Based Out-of-Distribution Classification in Deep Reinforcement Learning
论文作者
论文摘要
稳健性(OOD)数据是构建可靠的机器学习系统的重要目标。特别是在自主系统中,对OOD输入的错误预测可能会导致安全关键情况。作为解决方案的第一步,我们考虑在基于价值的深入学习(RL)设置中检测此类数据的问题。将这个问题建模为单级分类问题,我们为基于不确定性的OOD分类提供了一个框架:Ubood。这是基于训练期间遇到的情况(分发)遇到的情况,因此降低了代理商的认知不确定性,因此低于未遇到的(OOD)情况。对估计认知不确定性的方法不可知,与不同的不确定性估计方法的组合,例如近似贝叶斯推理方法或结合技术是可能的。我们进一步提出了基于训练数据的不确定性分布来计算动态分类阈值的第一个可行解决方案。评估表明,与基于集合的估计器结合使用,该框架会产生可靠的分类结果,而与基于混凝土脱落的估计器的组合无法可靠地检测到OOD情况。总而言之,Ubood通过利用代理价值函数的认知不确定性,在深度RL设置中提出了一种可行的OOD分类方法。
Robustness to out-of-distribution (OOD) data is an important goal in building reliable machine learning systems. Especially in autonomous systems, wrong predictions for OOD inputs can cause safety critical situations. As a first step towards a solution, we consider the problem of detecting such data in a value-based deep reinforcement learning (RL) setting. Modelling this problem as a one-class classification problem, we propose a framework for uncertainty-based OOD classification: UBOOD. It is based on the effect that an agent's epistemic uncertainty is reduced for situations encountered during training (in-distribution), and thus lower than for unencountered (OOD) situations. Being agnostic towards the approach used for estimating epistemic uncertainty, combinations with different uncertainty estimation methods, e.g. approximate Bayesian inference methods or ensembling techniques are possible. We further present a first viable solution for calculating a dynamic classification threshold, based on the uncertainty distribution of the training data. Evaluation shows that the framework produces reliable classification results when combined with ensemble-based estimators, while the combination with concrete dropout-based estimators fails to reliably detect OOD situations. In summary, UBOOD presents a viable approach for OOD classification in deep RL settings by leveraging the epistemic uncertainty of the agent's value function.