论文标题
基准在FPGA上进行高带宽内存
Benchmarking High Bandwidth Memory on FPGAs
论文作者
论文摘要
通过高带宽内存(HBM),FPGA开始增强,以减少某些应用中遇到的内存带宽瓶颈,并使FPGA具有更多处理应用程序状态的能力。但是,HBM的性能特征仍然没有很好地指定,尤其是在FPGA的背景下。在本文中,我们在最先进的FPGA上通过基准标准规格与实际性能之间的差距弥合了差距,即Xilinx Alveo U280具有两个堆栈HBM子系统。为此,我们提出了Shuhai,这是一种基准测试工具,使我们能够在FPGA上揭开HBM的所有基础细节。基于FPGA的基准测试也应提供与CPU/GPU上的HBM相比,由于CPU/GPU由于其复杂的控制逻辑和缓存层次结构而更嘈杂,因此HBM的图像更为准确。由于内存本身很复杂,因此利用自定义硬件逻辑在FPGA内部进行基准测试提供了更多详细信息以及准确和确定性的测量。我们观察到1)HBM能够提供多达425GB/s的内存带宽,而2)如何使用HBM对性能产生重大影响,这又表明了揭示HBM的性能特征以选择最佳方法的重要性。作为标准,我们还应用Shuhaito DDR4来显示HBM和DDR4.Shuhai之间的差异,可以轻松地将其推广到其他FPGA板或其他世代的内存,例如HBM3和DDR3。我们将为社区带来众多的源泉,使社区受益
FPGAs are starting to be enhanced with High Bandwidth Memory (HBM) as a way to reduce the memory bandwidth bottleneck encountered in some applications and to give the FPGA more capacity to deal with application state. However, the performance characteristics of HBM are still not well specified, especially in the context of FPGAs. In this paper, we bridge the gap between nominal specifications and actual performance by benchmarkingHBM on a state-of-the-art FPGA, i.e., a Xilinx Alveo U280 featuring a two-stack HBM subsystem. To this end, we propose Shuhai, a benchmarking tool that allows us to demystify all the underlying details of HBM on an FPGA. FPGA-based benchmarking should also provide a more accurate picture of HBM than doing so on CPUs/GPUs, since CPUs/GPUs are noisier systems due to their complex control logic and cache hierarchy. Since the memory itself is complex, leveraging custom hardware logic to benchmark inside an FPGA provides more details as well as accurate and deterministic measurements. We observe that 1) HBM is able to provide up to 425GB/s memory bandwidth, and 2) how HBM is used has a significant impact on performance, which in turn demonstrates the importance of unveiling the performance characteristics of HBM so as to select the best approach. As a yardstick, we also applyShuhaito DDR4to show the differences between HBM and DDR4.Shuhai can be easily generalized to other FPGA boards or other generations of memory, e.g., HBM3, and DDR3. We will makeShuhaiopen-source, benefiting the community