Dynamixer：具有动态混合的视觉MLP体系结构

论文标题

Dynamixer：具有动态混合的视觉MLP体系结构

DynaMixer: A Vision MLP Architecture with Dynamic Mixing

论文作者

Wang, Ziyu, Jiang, Wenhao, Zhu, Yiming, Yuan, Li, Song, Yibing, Liu, Wei

论文摘要

最近，类似于MLP的视觉模型已在主流视觉识别任务上实现了有希望的表演。与视觉变压器和CNN相反，类似于MLP的模型的成功表明，令牌和频道之间的简单信息融合操作可以为深度识别模型带来良好的表示能力。但是，现有的类似于MLP的模型通过静态融合操作融合代币，缺乏对代币内容的适应性。因此，习惯信息融合程序不够有效。为此，本文提出了一种有效的MLP式网络体系结构，称为Dynamixer，诉诸动态信息融合。至关重要的是，我们提出了一个过程，该过程依赖于Dynamixer模型，通过利用混合所有令牌的内容来动态生成混合矩阵。为了减少时间复杂性并提高鲁棒性，采用了降低性降低技术和多段融合机制。我们提出的Dynamixer模型（9700万参数）在没有额外的训练数据的情况下，在Imagenet-1k数据集上实现了84.3 \％TOP-1的精度，对最先进的视觉MLP模型表现出色。当参数的数量减少到26m时，它仍然可以达到82.7 \％TOP-1的准确性，超过了具有相似容量的现有MLP模型。该代码可在\ url {https://github.com/ziyuwwang/dynamixer}中获得。

Recently, MLP-like vision models have achieved promising performances on mainstream visual recognition tasks. In contrast with vision transformers and CNNs, the success of MLP-like models shows that simple information fusion operations among tokens and channels can yield a good representation power for deep recognition models. However, existing MLP-like models fuse tokens through static fusion operations, lacking adaptability to the contents of the tokens to be mixed. Thus, customary information fusion procedures are not effective enough. To this end, this paper presents an efficient MLP-like network architecture, dubbed DynaMixer, resorting to dynamic information fusion. Critically, we propose a procedure, on which the DynaMixer model relies, to dynamically generate mixing matrices by leveraging the contents of all the tokens to be mixed. To reduce the time complexity and improve the robustness, a dimensionality reduction technique and a multi-segment fusion mechanism are adopted. Our proposed DynaMixer model (97M parameters) achieves 84.3\% top-1 accuracy on the ImageNet-1K dataset without extra training data, performing favorably against the state-of-the-art vision MLP models. When the number of parameters is reduced to 26M, it still achieves 82.7\% top-1 accuracy, surpassing the existing MLP-like models with a similar capacity. The code is available at \url{https://github.com/ziyuwwang/DynaMixer}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题