morpheus：使用空闲的GPU核心资源扩展GPU系统中的最后级缓存能力

论文标题

morpheus：使用空闲的GPU核心资源扩展GPU系统中的最后级缓存能力

Morpheus: Extending the Last Level Cache Capacity in GPU Systems Using Idle GPU Core Resources

论文作者

Darabi, Sina, Sadrosadati, Mohammad, Lindegger, Joël, Akbarzadeh, Negar, Hosseini, Mohammad, Park, Jisung, Gómez-Luna, Juan, Sarbazi-Azad, Hamid, Mutlu, Onur

论文摘要

图形处理单元（GPU）是用于数据并行应用程序的广泛使用的加速器。在许多GPU应用中，GPU存储器带宽瓶颈的性能，导致GPU内核的缺乏。因此，禁用许多核心不会影响内存工作负载的性能。虽然简单地进行电源的未使用的GPU内核可以节省能源，但先前的工作试图更好地利用GPU内核来用于其他应用程序（理想的计算结合），从而增加了GPU的总吞吐量。在本文中，我们介绍了Morpheus，这是一种新的硬件/软件共同设计的技术，可提高内存结合应用程序的性能。 Morpheus的关键思想是利用未使用的核心资源来扩展GPU最后一个级别缓存（LLC）容量。在morpheus中，每个GPU核心都有两种执行模式：计算模式和缓存模式。计算模式下的内核进行常规运行并运行应用程序线程。但是，对于以高速缓存模式的内核，morpheus调用了一种软件助手内核，该软件内核使用芯片的片上记忆（即寄存器文件，共享内存和L1），以扩展LLC的能力来实现运行内存的工作负载。 Morpheus将控制器添加到GPU硬件中，以将LLC请求转发到常规LLC（由硬件管理）或Extended LLC（由助手内核管理）。我们的实验结果表明，在几个内存的工作负载中，Morpheus的基线GPU架构的性能和能源效率平均分别提高了39％和58％。 Morpheus的性能占具有四倍大小的常规LLC的GPU设计的3％。因此，Morpheus可以通过利用闲置核心的芯片内存资源作为附加的缓存容量来减少专用于常规LLC的硬件。

Graphics Processing Units (GPUs) are widely-used accelerators for data-parallel applications. In many GPU applications, GPU memory bandwidth bottlenecks performance, causing underutilization of GPU cores. Hence, disabling many cores does not affect the performance of memory-bound workloads. While simply power-gating unused GPU cores would save energy, prior works attempt to better utilize GPU cores for other applications (ideally compute-bound), which increases the GPU's total throughput. In this paper, we introduce Morpheus, a new hardware/software co-designed technique to boost the performance of memory-bound applications. The key idea of Morpheus is to exploit unused core resources to extend the GPU last level cache (LLC) capacity. In Morpheus, each GPU core has two execution modes: compute mode and cache mode. Cores in compute mode operate conventionally and run application threads. However, for the cores in cache mode, Morpheus invokes a software helper kernel that uses the cores' on-chip memories (i.e., register file, shared memory, and L1) in a way that extends the LLC capacity for a running memory-bound workload. Morpheus adds a controller to the GPU hardware to forward LLC requests to either the conventional LLC (managed by hardware) or the extended LLC (managed by the helper kernel). Our experimental results show that Morpheus improves the performance and energy efficiency of a baseline GPU architecture by an average of 39% and 58%, respectively, across several memory-bound workloads. Morpheus' performance is within 3% of a GPU design that has a quadruple-sized conventional LLC. Morpheus can thus contribute to reducing the hardware dedicated to a conventional LLC by exploiting idle cores' on-chip memory resources as additional cache capacity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题