论文标题
Lupulus:神经网络的灵活硬件加速器
Lupulus: A Flexible Hardware Accelerator for Neural Networks
论文作者
论文摘要
对于广泛的应用程序,神经网络已经是必不可少的,但是它们具有很高的计算和内存要求,需要从网络算法描述到硬件实现的优化。此外,机器学习中的高创新率使硬件实现提供了高度的可编程性来支持神经网络的当前和未来需求。在这项工作中,我们为神经网络(称为lupulus)提供了一个灵活的硬件加速器,支持将操作调度和映射到加速器上的各种方法。 Lupulus在28nm的FD-SOI技术中实施,并证明了380 GOPS/GHz的峰值性能分别为Alexnet和VGG-16的卷积层,潜伏期为21.4ms和183.6ms。
Neural networks have become indispensable for a wide range of applications, but they suffer from high computational- and memory-requirements, requiring optimizations from the algorithmic description of the network to the hardware implementation. Moreover, the high rate of innovation in machine learning makes it important that hardware implementations provide a high level of programmability to support current and future requirements of neural networks. In this work, we present a flexible hardware accelerator for neural networks, called Lupulus, supporting various methods for scheduling and mapping of operations onto the accelerator. Lupulus was implemented in a 28nm FD-SOI technology and demonstrates a peak performance of 380 GOPS/GHz with latencies of 21.4ms and 183.6ms for the convolutional layers of AlexNet and VGG-16, respectively.