论文标题
平行广义脂肪树的基于节点型的负载平衡路由
Node-Type-Based Load-Balancing Routing for Parallel Generalized Fat-Trees
论文作者
论文摘要
高性能计算(HPC)簇由多种节点类型(通常是计算,I/O,服务和GPGPU节点)组成,并且应用程序不以相同方式使用其他类型的节点。最终的沟通模式反映了节点组的组织,以及当前的全部模式最佳路由算法并不总是最大程度地提高特定于小组的通信的性能。由于应用程序通信模式很少事先获得,因此我们选择依靠节点类型作为节点使用情况的一个很好的猜测。我们提供了节点类型异质性的描述,并分析了由于相同类型的节点不幸的重新分配而引起的性能降解。我们为平行的广义脂肪拓扑(PGFT)提供了路由算法的扩展,该算法平衡了相同类型的节点组之间的负载。我们通过比较各种情况与相应的经典算法进行比较来消除这些绩效问题。
High-Performance Computing (HPC) clusters are made up of a variety of node types (usually compute, I/O, service, and GPGPU nodes) and applications don't use nodes of a different type the same way. Resulting communication patterns reflect organization of groups of nodes, and current optimal routing algorithms for all-to-all patterns will not always maximize performance for group-specific communications. Since application communication patterns are rarely available beforehand, we choose to rely on node types as a good guess for node usage. We provide a description of node type heterogeneity and analyse performance degradation caused by unlucky repartition of nodes of the same type. We provide an extension to routing algorithms for Parallel Generalized Fat-Tree topologies (PGFTs) which balances load amongst groups of nodes of the same type. We show how it removes these performance issues by comparing results in a variety of situations against corresponding classical algorithms.