论文标题
grow-push proule:对齐深度判别因子以进行有效的结构网络压缩
Grow-Push-Prune: aligning deep discriminants for effective structural network compression
论文作者
论文摘要
当今大多数流行的深层建筑都是手工设计的,是通才。但是,此设计程序通常会导致针对特定任务的大量冗余,无用甚至有害功能。不必要的很高的复杂性使深网对于许多现实世界的应用不切实际,尤其是那些没有强大的GPU支持的应用程序。在本文中,我们试图从深度判别分析的角度来得出任务依赖的紧凑模型。我们为分类任务提出了一种迭代和主动的方法,该方法在(1)推动步骤之间交替,目的是同时最大程度地提高班级分离,惩罚辅助因素,并将深层判别物与紧凑的神经元组合在一起,以及(2)降低有用的神经元,丢弃了固定神经元。采用反卷积来扭转“不重要的“过滤器”效果并恢复有用的贡献来源。提出了一种基于基本成立模块的简单网络增长策略,用于挑战比基本网所提供的更大容量的任务。 MNIST,CIFAR10和IMAGENET数据集的实验证明了我们的方法的功效。在ImageNet上,通过推动和修剪我们成长的Inception-88模型,我们获得的模型比在生长,残留网和流行的紧凑型网中生成的启动网的准确模型更为准确。我们还表明,我们成长的启动网(没有硬编码的维度对齐)明显优于相似复杂性的残留网。
Most of today's popular deep architectures are hand-engineered to be generalists. However, this design procedure usually leads to massive redundant, useless, or even harmful features for specific tasks. Unnecessarily high complexities render deep nets impractical for many real-world applications, especially those without powerful GPU support. In this paper, we attempt to derive task-dependent compact models from a deep discriminant analysis perspective. We propose an iterative and proactive approach for classification tasks which alternates between (1) a pushing step, with an objective to simultaneously maximize class separation, penalize co-variances, and push deep discriminants into alignment with a compact set of neurons, and (2) a pruning step, which discards less useful or even interfering neurons. Deconvolution is adopted to reverse 'unimportant' filters' effects and recover useful contributing sources. A simple network growing strategy based on the basic Inception module is proposed for challenging tasks requiring larger capacity than what the base net can offer. Experiments on the MNIST, CIFAR10, and ImageNet datasets demonstrate our approach's efficacy. On ImageNet, by pushing and pruning our grown Inception-88 model, we achieve more accurate models than Inception nets generated during growing, residual nets, and popular compact nets at similar sizes. We also show that our grown Inception nets (without hard-coded dimension alignment) clearly outperform residual nets of similar complexities.