蒸馏指导的二元卷积神经网络的残留学习

论文标题

蒸馏指导的二元卷积神经网络的残留学习

Distillation Guided Residual Learning for Binary Convolutional Neural Networks

论文作者

Ye, Jianming, Zhang, Shiliang, Wang, Jingdong

论文摘要

弥合二进制CNN（BCNN）和浮点CNN（FCNN）之间的性能差距是一项挑战。我们观察到，这种性能差距会导致BCNN和FCNN中间特征图之间的大量残差。为了最大程度地减少性能差距，我们强制执行BCNN与FCNN的类似中间特征图产生类似的中间特征图。这种训练策略，即，通过从FCNN得出的块蒸馏损失优化每个二元卷积块，可对BCNN进行更有效的优化。它还促使我们更新二进制卷积块体系结构，以促进良好的蒸馏损失的优化。具体而言，将轻质快捷分支插入每个二元卷积块中，以补充每个块的残差。从其挤压和相互作用（SI）结构中受益，此快捷键分支引入了一小部分参数，例如10 \％的开销，但有效地补充了残差。对Imagenet的广泛实验证明了我们方法在分类效率和准确性方面的卓越性能，例如，接受我们方法训练的BCNN实现了Imagenet上60.45％的准确性。

It is challenging to bridge the performance gap between Binary CNN (BCNN) and Floating point CNN (FCNN). We observe that, this performance gap leads to substantial residuals between intermediate feature maps of BCNN and FCNN. To minimize the performance gap, we enforce BCNN to produce similar intermediate feature maps with the ones of FCNN. This training strategy, i.e., optimizing each binary convolutional block with block-wise distillation loss derived from FCNN, leads to a more effective optimization to BCNN. It also motivates us to update the binary convolutional block architecture to facilitate the optimization of block-wise distillation loss. Specifically, a lightweight shortcut branch is inserted into each binary convolutional block to complement residuals at each block. Benefited from its Squeeze-and-Interaction (SI) structure, this shortcut branch introduces a fraction of parameters, e.g., 10\% overheads, but effectively complements the residuals. Extensive experiments on ImageNet demonstrate the superior performance of our method in both classification efficiency and accuracy, e.g., BCNN trained with our methods achieves the accuracy of 60.45\% on ImageNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题