论文标题

基于抽样的近似天际线计算大数据

Sampling Based Approximate Skyline Calculation on Big Data

论文作者

Xiao, Xingxing, Li, Jianzhong

论文摘要

现有用于处理天际线查询的算法无法适应大数据。本文提出了基于采样的两种近似天际线算法。第一种算法获得固定尺寸样本并计算样品上的近似天际线。在大多数情况下,第一算法的误差相对较小,几乎与输入关系大小无关。第二算法返回确切天际线的$(ε,δ)$ - 近似。第二算法所需的样品大小可以视为相对于输入关系大小的常数,运行时间也是如此。实验验证了第一种算法的误差分析,并表明第二算法比现有的天际线算法快得多。

The existing algorithms for processing skyline queries cannot adapt to big data. This paper proposes two approximate skyline algorithms based on sampling. The first algorithm obtains a fixed size sample and computes the approximate skyline on the sample. The error of the first algorithm is relatively small in most cases, and is almost independent of the input relation size. The second algorithm returns an $(ε,δ)$-approximation for the exact skyline. The size of sample required by the second algorithm can be regarded as a constant relative to the input relation size, so is the running time. Experiments verify the error analysis of the first algorithm and show that the second algorithm is much faster than the existing skyline algorithms.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源