Model Lego：创建诸如拆卸和组装构建块之类的模型

论文标题

Model Lego：创建诸如拆卸和组装构建块之类的模型

Model LEGO: Creating Models Like Disassembling and Assembling Building Blocks

论文作者

Hu, Jiacong, Gao, Jing, Ye, Jingwen, Gao, Yang, Wang, Xingen, Feng, Zunlei, Song, Mingli

论文摘要

随着深度学习的快速发展，参数的复杂性和规模的增加使培训成为越来越多的资源密集型模型。在本文中，我们从经典的卷积神经网络（CNN）开始，并探索不需要培训来获得新模型的范式。与CNN的诞生相似，受到生物视觉系统中接受场的启发，我们从生物视觉系统中的信息子系统途径中汲取灵感，并提出模型拆卸和组装（MDA）。在模型拆卸过程中，我们介绍了相对贡献的概念，并提出了一种组件定位技术，以从训练有素的CNN分类器中提取任务感知的组件。对于模型组装，我们提出了使用拆卸的任务意识组件来构建针对特定任务量身定制的新模型的对齐填充策略和参数缩放策略。整个过程类似于使用乐高积木，实现新模型的任意组装，并为模型创建和重复使用提供了新颖的观点。广泛的实验表明，使用这些组件密切匹配甚至超过基线的性能，与CNN分类器或新型号分解了任务感知组件，证明了其模型重用的有希望的结果。此外，MDA展示了各种潜在的应用，并通过综合实验探讨了模型决策路线分析，模型压缩，知识蒸馏等。该代码可在https://github.com/jiaconghu/model-lego上找到。

With the rapid development of deep learning, the increasing complexity and scale of parameters make training a new model increasingly resource-intensive. In this paper, we start from the classic convolutional neural network (CNN) and explore a paradigm that does not require training to obtain new models. Similar to the birth of CNN inspired by receptive fields in the biological visual system, we draw inspiration from the information subsystem pathways in the biological visual system and propose Model Disassembling and Assembling (MDA). During model disassembling, we introduce the concept of relative contribution and propose a component locating technique to extract task-aware components from trained CNN classifiers. For model assembling, we present the alignment padding strategy and parameter scaling strategy to construct a new model tailored for a specific task, utilizing the disassembled task-aware components. The entire process is akin to playing with LEGO bricks, enabling arbitrary assembly of new models, and providing a novel perspective for model creation and reuse. Extensive experiments showcase that task-aware components disassembled from CNN classifiers or new models assembled using these components closely match or even surpass the performance of the baseline, demonstrating its promising results for model reuse. Furthermore, MDA exhibits diverse potential applications, with comprehensive experiments exploring model decision route analysis, model compression, knowledge distillation, and more. The code is available at https://github.com/jiaconghu/Model-LEGO.

下载PDF全文

下载文献需遵守相关版权规定

论文标题