论文标题
大量亚仪表计数的大量平行算法
Massively Parallel Algorithms for Small Subgraph Counting
论文作者
论文摘要
在过去的二十年中,随着大型网络数据集的日益普及,用于分布式记忆并行计算的框架(例如MapReduce,Hadoop,Spark和Dryad)越来越流行。大规模并行计算(MPC)模型是在理论上研究这些框架中图形算法的事实上标准。子图计数是分析大量图的一个基本问题,主要算法挑战以设计既可扩展又准确的方法为中心。 给定图形$ g =(v,e)$,带有$ n $顶点,$ m $边缘和$ t $ triangles,我们的第一个结果是一种算法,该算法输出了$(1+ \ varepsilon)$ - 近似值至$ t $,以$ ph {最佳空间和$ s q \ geq( n^2/m)} $每台机器,并假设$ t =ω(\ sqrt {m/n})$。我们的结果比以前的作品对$ t $的界限进行了二次改进。我们还提供了一个简单的结果,以计算\ emph {any}子图的$ k $ size的$ k \ geq 1 $。我们的第二个结果是$ o _ {\ varepsilon}(\ log \ log n)$ - 圆形算法准确地计算三角形的数量,其总空间用法由输入图的\ emph {Arboricity} $α$α$α$。我们将此结果扩展到确切计数$ k $ cliques的任何常数$ k $。最后,我们证明了Bera,Pashanasangi和Seshadhri(ITCS 2020)的最新结果,以准确地计算所有大小最多的$ 5 $的子图,可以在MPC模型中实现。
Over the last two decades, frameworks for distributed-memory parallel computation, such as MapReduce, Hadoop, Spark and Dryad, have gained significant popularity with the growing prevalence of large network datasets. The Massively Parallel Computation (MPC) model is the de-facto standard for studying graph algorithms in these frameworks theoretically. Subgraph counting is one such fundamental problem in analyzing massive graphs, with the main algorithmic challenges centering on designing methods which are both scalable and accurate. Given a graph $G=(V, E)$ with $n$ vertices, $m$ edges and $T$ triangles, our first result is an algorithm that outputs a $(1+\varepsilon)$-approximation to $T$, with asymptotically \emph{optimal round and total space complexity} provided any $S \geq \max{(\sqrt m, n^2/m)}$ space per machine and assuming $T=Ω(\sqrt{m/n})$. Our result gives a quadratic improvement on the bound on $T$ over previous works. We also provide a simple extension of our result to counting \emph{any} subgraph of $k$ size for constant $k \geq 1$. Our second result is an $O_{\varepsilon}(\log \log n)$-round algorithm for exactly counting the number of triangles, whose total space usage is parametrized by the \emph{arboricity} $α$ of the input graph. We extend this result to exactly counting $k$-cliques for any constant $k$. Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most $5$ can be implemented in the MPC model in total space.