论文标题

三角形和四个周期计数,并在图流中进行预测

Triangle and Four Cycle Counting with Predictions in Graph Streams

论文作者

Chen, Justin Y., Eden, Talya, Indyk, Piotr, Lin, Honghao, Narayanan, Shyam, Rubinfeld, Ronitt, Silwal, Sandeep, Wagner, Tal, Woodruff, David P., Zhang, Michael

论文摘要

我们提出了数据驱动的一通流媒体流算法,用于估计三角形和四个周期的数量,这是图形数据流文献中广泛研究的图形分析中的两个基本问题。最近,(HSU 2018)和(Jiang 2020)在其他数据流问题中应用机器学习技术,使用训练有素的甲骨文,该甲骨文可以预测流元素的某些属性,以改进不使用甲壳的先前的“经典”算法。在本文中,我们探讨了多个图边流模型中“重边”甲骨文的功能。在“邻接列表”模型中,我们提出了一个单通行的三角计数算法,在没有这样的甲骨文的情况下,对先前空间上限进行了改进。在任意阶模型中,我们以更少的通过和与以前的算法相同的空间复杂性提出了三角形和四个周期估计的算法,我们显示其中几个界限是最佳的。我们在几个噪声模型下分析算法,表明即使在Oracle ERRS时,该算法的性能也很好。我们的方法可以在先前的“经典”流算法上进行扩展,因为以前的多通和随机订单流算法可以看作是我们算法的特殊情况,在该算法中,使用第一个通过或随机订单来实现重边甲骨文。最后,我们的实验证明了与最新的流媒体算法相比,该方法的优势。

We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles, two fundamental problems in graph analytics that are widely studied in the graph data stream literature. Recently, (Hsu 2018) and (Jiang 2020) applied machine learning techniques in other data stream problems, using a trained oracle that can predict certain properties of the stream elements to improve on prior "classical" algorithms that did not use oracles. In this paper, we explore the power of a "heavy edge" oracle in multiple graph edge streaming models. In the adjacency list model, we present a one-pass triangle counting algorithm improving upon the previous space upper bounds without such an oracle. In the arbitrary order model, we present algorithms for both triangle and four cycle estimation with fewer passes and the same space complexity as in previous algorithms, and we show several of these bounds are optimal. We analyze our algorithms under several noise models, showing that the algorithms perform well even when the oracle errs. Our methodology expands upon prior work on "classical" streaming algorithms, as previous multi-pass and random order streaming algorithms can be seen as special cases of our algorithms, where the first pass or random order was used to implement the heavy edge oracle. Lastly, our experiments demonstrate advantages of the proposed method compared to state-of-the-art streaming algorithms.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源