虚拟链接：可扩展的多生产商，多核器消息队列架构，用于跨核心通信

论文标题

虚拟链接：可扩展的多生产商，多核器消息队列架构，用于跨核心通信

Virtual-Link: A Scalable Multi-Producer, Multi-Consumer Message Queue Architecture for Cross-Core Communication

论文作者

Wu, Qinzhe, Beard, Jonathan, Ekanayake, Ashen, Gerstlauer, Andreas, John, Lizy K.

论文摘要

随着每个芯片的处理元素的数量增加，跨核通信越来越成为一种瓶颈。跨核通信的典型硬件解决方案通常不灵活。尽管软件解决方案是灵活的，但它们具有性能缩放限制。正如我们将显示的那样，一个关键问题是基于软件的消息队列机制中共享状态的关键问题。本文提出了虚拟链接（VL），这是一种具有硬件支持的新型轻型通信机制，可促进M：N锁定数据移动。 VL将相干共享状态的数量减少到零。 VL通过将数据保留在快速路径上（即，在芯片互连）上，提供了进一步的延迟益处。 VL可以在相干总线上的PES之间进行定向缓存（藏匿），从而减少了核心与核心通信的延迟。 VL对于流媒体数据的细粒度任务特别有效。在具有7个基准测试的完整系统模拟器上的评估表明，VL在基于最先进的软件通信机制上实现了2.09倍的加速，同时将内存流量降低了61％。

Cross-core communication is increasingly a bottleneck as the number of processing elements increase per system-on-chip. Typical hardware solutions to cross-core communication are often inflexible; while software solutions are flexible, they have performance scaling limitations. A key problem, as we will show, is that of shared state in software-based message queue mechanisms. This paper proposes Virtual-Link (VL), a novel light-weight communication mechanism with hardware support to facilitate M:N lock-free data movement. VL reduces the amount of coherent shared state, which is a bottleneck for many approaches, to zero. VL provides further latency benefit by keeping data on the fast path (i.e., within the on-chip interconnect). VL enables directed cache-injection (stashing) between PEs on the coherence bus, reducing the latency for core-to-core communication. VL is particularly effective for fine-grain tasks on streaming data. Evaluation on a full system simulator with 7 benchmarks shows that VL achieves a 2.09x speedup over state-of-the-art software-based communication mechanisms, while reducing memory traffic by 61%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题