论文标题
天哪:在小硬件上嵌入大图
GOSH: Embedding Big Graphs on Small Hardware
论文作者
论文摘要
在图形嵌入中,图形的连接信息用于表示每个顶点作为D维空间中的一个点。与原始的不规则结构信息不同,这种表示形式可用于多种机器学习任务。尽管该过程在实践中非常有用,但确实很昂贵,而且不幸的是,这些图的图表变得越来越大,很难嵌入。尝试扩大该过程到较大图的尝试已经成功,但在硬件要求下通常以陡峭的价格以陡峭的价格。我们提出了天哪,这是一种将任意大小的图表嵌入至最小约束的单个GPU上的方法。 GOSH利用一种新颖的图形粗化方法来压缩图形并最大程度地减少嵌入所需的工作,与最先进的时间相比,在一小部分时间内提供了高质量的嵌入。除此之外,它还包含了一个分解模式,该模式使任何任意大图都可以使用单个GPU嵌入具有最小约束的单个GPU。通过这些技术,GOSH能够在不到一个小时的时间内嵌入超过6500万个顶点和18亿个边缘的图形,并在单个GPU上嵌入了93%的aucroc以进行链路预测,通过运行该工具80分钟可以增加到95%。
In graph embedding, the connectivity information of a graph is used to represent each vertex as a point in a d-dimensional space. Unlike the original, irregular structural information, such a representation can be used for a multitude of machine learning tasks. Although the process is extremely useful in practice, it is indeed expensive and unfortunately, the graphs are becoming larger and harder to embed. Attempts at scaling up the process to larger graphs have been successful but often at a steep price in hardware requirements. We present GOSH, an approach for embedding graphs of arbitrary sizes on a single GPU with minimum constraints. GOSH utilizes a novel graph coarsening approach to compress the graph and minimize the work required for embedding, delivering high-quality embeddings at a fraction of the time compared to the state-of-the-art. In addition to this, it incorporates a decomposition schema that enables any arbitrarily large graph to be embedded using a single GPU with minimum constraints on the memory size. With these techniques, GOSH is able to embed a graph with over 65 million vertices and 1.8 billion edges in less than an hour on a single GPU and obtains a 93% AUCROC for link-prediction which can be increased to 95% by running the tool for 80 minutes.