论文标题
自我知识蒸馏,渐进的目标进行了改进
Self-Knowledge Distillation with Progressive Refinement of Targets
论文作者
论文摘要
深层神经网络的概括能力已通过应用广泛的正则化方法,例如限制功能空间,在培训期间注入随机性,增强数据等,我们提出了一种简单而有效的正则方法,我们提出了一种名为“渐进式自我知识蒸馏(PS-kd”),在这项工作中逐渐蒸发,在这项工作中,我们提出了一种逐步的培训(I.逐渐蒸发),I。因此,随着学生成为教师本身,它可以在知识蒸馏的框架内进行解释。具体而言,通过结合模型本身的基础真相和过去预测来适应目标。我们表明,PS-KD通过根据难以分类的示例来重新缩放梯度来提供硬采矿的效果。所提出的方法适用于具有硬目标的任何监督学习任务,并且可以轻松与现有的正则化方法相结合以进一步增强概括性能。此外,已经证实,PS-KD不仅可以达到更好的准确性,而且还提供了高质量的置信度估计,并且在校准和序数排名方面。关于三个不同任务的大量实验结果,即图像分类,对象检测和机器翻译,表明我们的方法始终提高最新基准的性能。该代码可在https://github.com/lgcnsai/ps-kd-pytorch上找到。
The generalization capability of deep neural networks has been substantially improved by applying a wide spectrum of regularization methods, e.g., restricting function space, injecting randomness during training, augmenting data, etc. In this work, we propose a simple yet effective regularization method named progressive self-knowledge distillation (PS-KD), which progressively distills a model's own knowledge to soften hard targets (i.e., one-hot vectors) during training. Hence, it can be interpreted within a framework of knowledge distillation as a student becomes a teacher itself. Specifically, targets are adjusted adaptively by combining the ground-truth and past predictions from the model itself. We show that PS-KD provides an effect of hard example mining by rescaling gradients according to difficulty in classifying examples. The proposed method is applicable to any supervised learning tasks with hard targets and can be easily combined with existing regularization methods to further enhance the generalization performance. Furthermore, it is confirmed that PS-KD achieves not only better accuracy, but also provides high quality of confidence estimates in terms of calibration as well as ordinal ranking. Extensive experimental results on three different tasks, image classification, object detection, and machine translation, demonstrate that our method consistently improves the performance of the state-of-the-art baselines. The code is available at https://github.com/lgcnsai/PS-KD-Pytorch.