论文标题
DNNexplorer:建模和探索基于FPGA的DNN加速器的新型范式的框架
DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-based DNN Accelerator
论文作者
论文摘要
现有的基于FPGA的DNN加速器通常属于两个设计范式。他们要么采用通用重复使用的体系结构来支持不同的DNN网络,但由于牺牲了设计特殊性,因此在桌子上留下了一些性能和效率。或者,他们采用层面量身定制的体系结构来优化针对计算和资源的特定图层需求,但会使对广泛的DNN网络适应性的可扩展性。为了克服这些缺点,本文提出了一种新型的基于FPGA的DNN加速器设计范式及其自动化工具,称为DNNexplorer,以便在拟议的范式下快速探索各种加速器设计,并为现有和Emerging DNN网络提供优化的加速器架构。三种关键技术对于DNNexplorer的提高性能,更好的特异性和可伸缩性至关重要,包括(1)具有高维设计空间支持和细粒度可调节性的独特加速器设计范式,(2)动态设计空间,可容纳DNN工作负载和目标fpgas的不同组合,以及(3)架设型的加速器(3),以及(3)启动器(3),以及(3)型号的启动(3)desitor contoration(3)dse tosorting for to insor contoration(3)dse for contoration(dse)供供应供应(3)。通过同时考虑FPGA的计算和内存资源以及DNN Networks的图层特征和整体复杂性,提出的范式。实验结果表明,与DNNexPlorer生成的相同FPGA相比,与DNNBuilder生成的最先进的层管道溶液相比,DNNexPlorer生成的加速器可提供高达4.2倍的性能(GOP/s),该解决方案对于VGG样DNN产生了38个CONS层。与具有通用可重复使用的计算单元的加速器相比,DNNexPlorer可实现高达2.0倍和4.4倍的DSP效率提高,而Academia(HybridDNN)最近发表的加速器设计和商业DNN ACCELERATOR IP(XILINX DPU)分别获得了高达2.0倍和4.4倍的DSP效率。
Existing FPGA-based DNN accelerators typically fall into two design paradigms. Either they adopt a generic reusable architecture to support different DNN networks but leave some performance and efficiency on the table because of the sacrifice of design specificity. Or they apply a layer-wise tailor-made architecture to optimize layer-specific demands for computation and resources but loose the scalability of adaptation to a wide range of DNN networks. To overcome these drawbacks, this paper proposes a novel FPGA-based DNN accelerator design paradigm and its automation tool, called DNNExplorer, to enable fast exploration of various accelerator designs under the proposed paradigm and deliver optimized accelerator architectures for existing and emerging DNN networks. Three key techniques are essential for DNNExplorer's improved performance, better specificity, and scalability, including (1) a unique accelerator design paradigm with both high-dimensional design space support and fine-grained adjustability, (2) a dynamic design space to accommodate different combinations of DNN workloads and targeted FPGAs, and (3) a design space exploration (DSE) engine to generate optimized accelerator architectures following the proposed paradigm by simultaneously considering both FPGAs' computation and memory resources and DNN networks' layer-wise characteristics and overall complexity. Experimental results show that, for the same FPGAs, accelerators generated by DNNExplorer can deliver up to 4.2x higher performances (GOP/s) than the state-of-the-art layer-wise pipelined solutions generated by DNNBuilder for VGG-like DNN with 38 CONV layers. Compared to accelerators with generic reusable computation units, DNNExplorer achieves up to 2.0x and 4.4x DSP efficiency improvement than a recently published accelerator design from academia (HybridDNN) and a commercial DNN accelerator IP (Xilinx DPU), respectively.