深度神经网络的广义贝叶斯后期蒸馏

论文标题

深度神经网络的广义贝叶斯后期蒸馏

Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks

论文作者

Vadera, Meet P., Jalaian, Brian, Marlin, Benjamin M.

论文摘要

在本文中，我们提出了一个通用框架，用于提炼有关深神经网络分类器的贝叶斯后分布的期望，从而扩展了贝叶斯黑暗知识框架的先前工作。拟议的框架将输入“老师”和学生模型架构以及对兴趣的一般后期期望。蒸馏方法使用迭代生成的蒙特卡洛样品对选定的后验预期进行在线压缩。我们专注于后验预测分布和预期熵作为蒸馏靶。我们研究了该框架的几个方面，包括不确定性的影响和学生模型体系结构的选择。我们从速度存储准确性的角度研究学生模型架构搜索方法，并评估利用熵蒸馏的下游任务，包括不确定性排名和分布外检测。

In this paper, we present a general framework for distilling expectations with respect to the Bayesian posterior distribution of a deep neural network classifier, extending prior work on the Bayesian Dark Knowledge framework. The proposed framework takes as input "teacher" and student model architectures and a general posterior expectation of interest. The distillation method performs an online compression of the selected posterior expectation using iteratively generated Monte Carlo samples. We focus on the posterior predictive distribution and expected entropy as distillation targets. We investigate several aspects of this framework including the impact of uncertainty and the choice of student model architecture. We study methods for student model architecture search from a speed-storage-accuracy perspective and evaluate down-stream tasks leveraging entropy distillation including uncertainty ranking and out-of-distribution detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题